Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Title Page Copyright and Credits
Practical Big Data Analytics
Packt Upsell
Why subscribe? PacktPub.com
Contributors
About the author About the reviewer Packt is searching for authors like you
Preface
Who this book is for What this book covers To get the most out of this book
Download the example code files Download the color images Conventions used
Get in touch
Reviews
Too Big or Not Too Big
What is big data?
A brief history of data
Dawn of the information age Dr. Alan Turing and modern computing The advent of the stored-program computer From magnetic devices to SSDs
Why we are talking about big data now if data has always existed
Definition of big data
Building blocks of big data analytics
Types of Big Data
Structured Unstructured Semi-structured
Sources of big data
The 4Vs of big data
When do you know you have a big data problem and where do you start your search for the big data solution? Summary
Big Data Mining for the Masses
What is big data mining?
Big data mining in the enterprise
Building the case for a Big Data strategy Implementation life cycle Stakeholders of the solution Implementing the solution
Technical elements of the big data platform
Selection of the hardware stack Selection of the software stack
Summary
The Analytics Toolkit
Components of the Analytics Toolkit System recommendations
Installing on a laptop or workstation Installing on the cloud
Installing Hadoop
Installing Oracle VirtualBox Installing CDH in other environments
Installing Packt Data Science Box Installing Spark Installing R
Steps for downloading and installing Microsoft R Open
Installing RStudio Installing Python Summary
Big Data With Hadoop
The fundamentals of Hadoop
The fundamental premise of Hadoop The core modules of Hadoop
Hadoop Distributed File System - HDFS Data storage process in HDFS
Hadoop MapReduce
An intuitive introduction to MapReduce A technical understanding of MapReduce Block size and number of mappers and reducers
Hadoop YARN
Job scheduling in YARN Other topics in Hadoop
Encryption User authentication Hadoop data storage formats
New features expected in Hadoop 3
The Hadoop ecosystem Hands-on with CDH
WordCount using Hadoop MapReduce Analyzing oil import prices with Hive
Joining tables in Hive
Summary
Big Data Mining with NoSQL
Why NoSQL?
The ACID, BASE, and CAP properties
ACID and SQL The BASE property of NoSQL The CAP theorem
The need for NoSQL technologies
Google Bigtable Amazon Dynamo
NoSQL databases
In-memory databases Columnar databases Document-oriented databases Key-value databases Graph databases Other NoSQL types and summary of other types of databases 
Analyzing Nobel Laureates data with MongoDB
JSON format Installing and using MongoDB
Tracking physician payments with real-world data
Installing kdb+, R, and RStudio
Installing kdb+ Installing R Installing RStudio
The CMS Open Payments Portal
Downloading the CMS Open Payments data Creating the Q application
Loading the data The backend code
Creating the frontend web portal
R Shiny platform for developers
Putting it all together - The CMS Open Payments application Applications
Summary
Spark for Big Data Analytics
The advent of Spark
Limitations of Hadoop Overcoming the limitations of Hadoop Theoretical concepts in Spark
Resilient distributed datasets Directed acyclic graphs SparkContext Spark DataFrames Actions and transformations Spark deployment options Spark APIs
Core components in Spark
Spark Core Spark SQL Spark Streaming GraphX MLlib
The architecture of Spark Spark solutions
Spark practicals
Signing up for Databricks Community Edition
Spark exercise - hands-on with Spark (Databricks) Summary
An Introduction to Machine Learning Concepts
What is machine learning?
The evolution of machine learning
Factors that led to the success of machine learning Machine learning, statistics, and AI Categories of machine learning
Supervised and unsupervised machine learning
Supervised machine learning
Vehicle Mileage, Number Recognition and other examples
Unsupervised machine learning
Subdividing supervised machine learning Common terminologies in machine learning The core concepts in machine learning
Data management steps in machine learning
Pre-processing and feature selection techniques
Centering and scaling
The near-zero variance function Removing correlated variables Other common data transformations Data sampling Data imputation The importance of variables
The train, test splits, and cross-validation concepts
Splitting the data into train and test sets The cross-validation parameter
Creating the model
Leveraging multicore processing in the model Summary
Machine Learning Deep Dive
The bias, variance, and regularization properties The gradient descent and VC Dimension theories Popular machine learning algorithms
Regression models Association rules
Confidence Support Lift
Decision trees The Random forest extension Boosting algorithms Support vector machines The K-Means machine learning technique The neural networks related algorithms
Tutorial - associative rules mining with CMS data
Downloading the data Writing the R code for Apriori Shiny (R Code) Using custom CSS and fonts for the application Running the application
Summary
Enterprise Data Science
Enterprise data science overview A roadmap to enterprise analytics success Data science solutions in the enterprise
Enterprise data warehouse and data mining Traditional data warehouse systems
Oracle Exadata, Exalytics, and TimesTen HP Vertica Teradata IBM data warehouse systems (formerly Netezza appliances) PostgreSQL Greenplum SAP Hana
Enterprise and open source NoSQL Databases
Kdb+ MongoDB Cassandra Neo4j
Cloud databases
Amazon Redshift, Redshift Spectrum, and Athena databases Google BigQuery and other cloud services Azure CosmosDB
GPU databases
Brytlyt MapD
Other common databases
Enterprise data science – machine learning and AI
The R programming language Python OpenCV, Caffe, and others Spark Deep learning H2O and Driverless AI Datarobot Command-line tools Apache MADlib Machine learning as a service
Enterprise infrastructure solutions
Cloud computing Virtualization Containers – Docker, Kubernetes, and Mesos On-premises hardware Enterprise Big Data
Tutorial – using RStudio in the cloud Summary
Closing Thoughts on Big Data
Corporate big data and data science strategy Ethical considerations Silicon Valley and data science The human factor
Characteristics of successful projects
Summary
External Data Science Resources
Big data resources NoSQL products Languages and tools Creating dashboards Notebooks Visualization libraries Courses on R Courses on machine learning Machine learning and deep learning links Web-based machine learning services Movies Machine learning books from Packt Books for leisure reading
Other Books You May Enjoy
Leave a review - let other readers know what you think
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion