Practical Big Data Analytics by Dasgupta, Nataraj -- Read -- Imperial Library of Trantor

In-memory databases Columnar databases Document-oriented databases Key-value databases Graph databases Other NoSQL types and summary of other types of databases

Analyzing Nobel Laureates data with MongoDB

JSON format Installing and using MongoDB

Tracking physician payments with real-world data

Installing kdb+, R, and RStudio

Installing kdb+ Installing R Installing RStudio

The CMS Open Payments Portal

Downloading the CMS Open Payments data Creating the Q application

Loading the data The backend code

Creating the frontend web portal

R Shiny platform for developers

Putting it all together - The CMS Open Payments application Applications

Summary

Spark for Big Data Analytics

The advent of Spark

Limitations of Hadoop Overcoming the limitations of Hadoop Theoretical concepts in Spark

Resilient distributed datasets Directed acyclic graphs SparkContext Spark DataFrames Actions and transformations Spark deployment options Spark APIs

Core components in Spark

Spark Core Spark SQL Spark Streaming GraphX MLlib

The architecture of Spark Spark solutions

Spark practicals

Signing up for Databricks Community Edition

Spark exercise - hands-on with Spark (Databricks) Summary

An Introduction to Machine Learning Concepts

What is machine learning?

The evolution of machine learning

Factors that led to the success of machine learning Machine learning, statistics, and AI Categories of machine learning

Supervised and unsupervised machine learning

Supervised machine learning

Vehicle Mileage, Number Recognition and other examples

Unsupervised machine learning

Subdividing supervised machine learning Common terminologies in machine learning The core concepts in machine learning

Data management steps in machine learning

Pre-processing and feature selection techniques

Centering and scaling

The near-zero variance function Removing correlated variables Other common data transformations Data sampling Data imputation The importance of variables

The train, test splits, and cross-validation concepts

Splitting the data into train and test sets The cross-validation parameter

Creating the model

Leveraging multicore processing in the model Summary

Machine Learning Deep Dive

The bias, variance, and regularization properties The gradient descent and VC Dimension theories Popular machine learning algorithms

Regression models Association rules

Confidence Support Lift

Decision trees The Random forest extension Boosting algorithms Support vector machines The K-Means machine learning technique The neural networks related algorithms

Tutorial - associative rules mining with CMS data

Downloading the data Writing the R code for Apriori Shiny (R Code) Using custom CSS and fonts for the application Running the application

Summary

Enterprise Data Science

Enterprise data science overview A roadmap to enterprise analytics success Data science solutions in the enterprise

Enterprise data warehouse and data mining Traditional data warehouse systems

Oracle Exadata, Exalytics, and TimesTen HP Vertica Teradata IBM data warehouse systems (formerly Netezza appliances) PostgreSQL Greenplum SAP Hana

Enterprise and open source NoSQL Databases

Kdb+ MongoDB Cassandra Neo4j

Cloud databases

Amazon Redshift, Redshift Spectrum, and Athena databases Google BigQuery and other cloud services Azure CosmosDB

GPU databases

Brytlyt MapD

Other common databases

Enterprise data science – machine learning and AI

The R programming language Python OpenCV, Caffe, and others Spark Deep learning H2O and Driverless AI Datarobot Command-line tools Apache MADlib Machine learning as a service

Enterprise infrastructure solutions

Cloud computing Virtualization Containers – Docker, Kubernetes, and Mesos On-premises hardware Enterprise Big Data

Tutorial – using RStudio in the cloud Summary

Closing Thoughts on Big Data

Corporate big data and data science strategy Ethical considerations Silicon Valley and data science The human factor

Characteristics of successful projects

Summary

External Data Science Resources

Big data resources NoSQL products Languages and tools Creating dashboards Notebooks Visualization libraries Courses on R Courses on machine learning Machine learning and deep learning links Web-based machine learning services Movies Machine learning books from Packt Books for leisure reading

Other Books You May Enjoy

Leave a review - let other readers know what you think

← Prev
Back
Next →

← Prev
Back
Next →