Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Preface
What this learning path covers
What you need for this learning path
Who this learning path is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
Module 1
Getting Started with Data Science
Problems solved using data science
Understanding the data science problem - solving approach
Using Java to support data science
Acquiring data for an application
The importance and process of cleaning data
Visualizing data to enhance understanding
The use of statistical methods in data science
Machine learning applied to data science
Using neural networks in data science
Deep learning approaches
Performing text analysis
Visual and audio analysis
Improving application performance using parallel techniques
Assembling the pieces
Summary
Data Acquisition
Understanding the data formats used in data science applications
Overview of CSV data
Overview of spreadsheets
Overview of databases
Overview of PDF files
Overview of JSON
Overview of XML
Overview of streaming data
Overview of audio/video/images in Java
Data acquisition techniques
Using the HttpUrlConnection class
Web crawlers in Java
Creating your own web crawler
Using the crawler4j web crawler
Web scraping in Java
Using API calls to access common social media sites
Using OAuth to authenticate users
Handing Twitter
Handling Wikipedia
Handling Flickr
Handling YouTube
Searching by keyword
Summary
Data Cleaning
Handling data formats
Handling CSV data
Handling spreadsheets
Handling Excel spreadsheets
Handling PDF files
Handling JSON
Using JSON streaming API
Using the JSON tree API
The nitty gritty of cleaning text
Using Java tokenizers to extract words
Java core tokenizers
Third-party tokenizers and libraries
Transforming data into a usable form
Simple text cleaning
Removing stop words
Finding words in text
Finding and replacing text
Data imputation
Subsetting data
Sorting text
Data validation
Validating data types
Validating dates
Validating e-mail addresses
Validating ZIP codes
Validating names
Cleaning images
Changing the contrast of an image
Smoothing an image
Brightening an image
Resizing an image
Converting images to different formats
Summary
Data Visualization
Understanding plots and graphs
Visual analysis goals
Creating index charts
Creating bar charts
Using country as the category
Using decade as the category
Creating stacked graphs
Creating pie charts
Creating scatter charts
Creating histograms
Creating donut charts
Creating bubble charts
Summary
Statistical Data Analysis Techniques
Working with mean, mode, and median
Calculating the mean
Using simple Java techniques to find mean
Using Java 8 techniques to find mean
Using Google Guava to find mean
Using Apache Commons to find mean
Calculating the median
Using simple Java techniques to find median
Using Apache Commons to find the median
Calculating the mode
Using ArrayLists to find multiple modes
Using a HashMap to find multiple modes
Using a Apache Commons to find multiple modes
Standard deviation
Sample size determination
Hypothesis testing
Regression analysis
Using simple linear regression
Using multiple regression
Summary
Machine Learning
Supervised learning techniques
Decision trees
Decision tree types
Decision tree libraries
Using a decision tree with a book dataset
Testing the book decision tree
Support vector machines
Using an SVM for camping data
Testing individual instances
Bayesian networks
Using a Bayesian network
Unsupervised machine learning
Association rule learning
Using association rule learning to find buying relationships
Reinforcement learning
Summary
Neural Networks
Training a neural network
Getting started with neural network architectures
Understanding static neural networks
A basic Java example
Understanding dynamic neural networks
Multilayer perceptron networks
Building the model
Evaluating the model
Predicting other values
Saving and retrieving the model
Learning vector quantization
Self-Organizing Maps
Using a SOM
Displaying the SOM results
Additional network architectures and algorithms
The k-Nearest Neighbors algorithm
Instantaneously trained networks
Spiking neural networks
Cascading neural networks
Holographic associative memory
Backpropagation and neural networks
Summary
Deep Learning
Deeplearning4j architecture
Acquiring and manipulating data
Reading in a CSV file
Configuring and building a model
Using hyperparameters in ND4J
Instantiating the network model
Training a model
Testing a model
Deep learning and regression analysis
Preparing the data
Setting up the class
Reading and preparing the data
Building the model
Evaluating the model
Restricted Boltzmann Machines
Reconstruction in an RBM
Configuring an RBM
Deep autoencoders
Building an autoencoder in DL4J
Configuring the network
Building and training the network
Saving and retrieving a network
Specialized autoencoders
Convolutional networks
Building the model
Evaluating the model
Recurrent Neural Networks
Summary
Text Analysis
Implementing named entity recognition
Using OpenNLP to perform NER
Identifying location entities
Classifying text
Word2Vec and Doc2Vec
Classifying text by labels
Classifying text by similarity
Understanding tagging and POS
Using OpenNLP to identify POS
Understanding POS tags
Extracting relationships from sentences
Using OpenNLP to extract relationships
Sentiment analysis
Downloading and extracting the Word2Vec model
Building our model and classifying text
Summary
Visual and Audio Analysis
Text-to-speech
Using FreeTTS
Getting information about voices
Gathering voice information
Understanding speech recognition
Using CMUPhinx to convert speech to text
Obtaining more detail about the words
Extracting text from an image
Using Tess4j to extract text
Identifying faces
Using OpenCV to detect faces
Classifying visual data
Creating a Neuroph Studio project for classifying visual images
Training the model
Summary
Mathematical and Parallel Techniques for Data Analysis
Implementing basic matrix operations
Using GPUs with DeepLearning4j
Using map-reduce
Using Apache's Hadoop to perform map-reduce
Writing the map method
Writing the reduce method
Creating and executing a new Hadoop job
Various mathematical libraries
Using the jblas API
Using the Apache Commons math API
Using the ND4J API
Using OpenCL
Using Aparapi
Creating an Aparapi application
Using Aparapi for matrix multiplication
Using Java 8 streams
Understanding Java 8 lambda expressions and streams
Using Java 8 to perform matrix multiplication
Using Java 8 to perform map-reduce
Summary
Bringing It All Together
Defining the purpose and scope of our application
Understanding the application's architecture
Data acquisition using Twitter
Understanding the TweetHandler class
Extracting data for a sentiment analysis model
Building the sentiment model
Processing the JSON input
Cleaning data to improve our results
Removing stop words
Performing sentiment analysis
Analysing the results
Other optional enhancements
Summary
Module 2
Data Science Using Java
Data science
Machine learning
Supervised learning
Unsupervised learning
Clustering
Dimensionality reduction
Natural Language Processing
Data science process models
CRISP-DM
A running example
Data science in Java
Data science libraries
Data processing libraries
Math and stats libraries
Machine learning and data mining libraries
Text processing
Summary
Data Processing Toolbox
Standard Java library
Collections
Input/Output
Reading input data
Writing ouput data
Streaming API
Extensions to the standard library
Apache Commons
Commons Lang
Commons IO
Commons Collections
Other commons modules
Google Guava
AOL Cyclops React
Accessing data
Text data and CSV
Web and HTML
JSON
Databases
DataFrames
Search engine - preparing data
Summary
Exploratory Data Analysis
Exploratory data analysis in Java
Search engine datasets
Apache Commons Math
Joinery
Interactive Exploratory Data Analysis in Java
JVM languages
Interactive Java
Joinery shell
Summary
Supervised Learning - Classification and Regression
Classification
Binary classification models
Smile
JSAT
LIBSVM and LIBLINEAR
Encog
Evaluation
Accuracy
Precision, recall, and F1
ROC and AU ROC (AUC)
Result validation
K-fold cross-validation
Training, validation, and testing
Case study - page prediction
Regression
Machine learning libraries for regression
Smile
JSAT
Other libraries
Evaluation
MSE
MAE
Case study - hardware performance
Summary
Unsupervised Learning - Clustering and Dimensionality Reduction
Dimensionality reduction
Unsupervised dimensionality reduction
Principal Component Analysis
Truncated SVD
Truncated SVD for categorical and sparse data
Random projection
Cluster analysis
Hierarchical methods
K-means
Choosing K in K-Means
DBSCAN
Clustering for supervised learning
Clusters as features
Clustering as dimensionality reduction
Supervised learning via clustering
Evaluation
Manual evaluation
Supervised evaluation
Unsupervised Evaluation
Summary
Working with Text - Natural Language Processing and Information Retrieval
Natural Language Processing and information retrieval
Vector Space Model - Bag of Words and TF-IDF
Vector space model implementation
Indexing and Apache Lucene
Natural Language Processing tools
Stanford CoreNLP
Customizing Apache Lucene
Machine learning for texts
Unsupervised learning for texts
Latent Semantic Analysis
Text clustering
Word embeddings
Supervised learning for texts
Text classification
Learning to rank for information retrieval
Reranking with Lucene
Summary
Extreme Gradient Boosting
Gradient Boosting Machines and XGBoost
Installing XGBoost
XGBoost in practice
XGBoost for classification
Parameter tuning
Text features
Feature importance
XGBoost for regression
XGBoost for learning to rank
Summary
Deep Learning with DeepLearning4J
Neural Networks and DeepLearning4J
ND4J - N-dimensional arrays for Java
Neural networks in DeepLearning4J
Convolutional Neural Networks
Deep learning for cats versus dogs
Reading the data
Creating the model
Monitoring the performance
Data augmentation
Running DeepLearning4J on GPU
Summary
Scaling Data Science
Apache Hadoop
Hadoop MapReduce
Common Crawl
Apache Spark
Link prediction
Reading the DBLP graph
Extracting features from the graph
Node features
Negative sampling
Edge features
Link Prediction with MLlib and XGBoost
Link suggestion
Summary
Deploying Data Science Models
Microservices
Spring Boot
Search engine service
Online evaluation
A/B testing
Multi-armed bandits
Summary
Bibliography
← Prev
Back
Next →
← Prev
Back
Next →