Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Preface
What this learning path covers What you need for this learning path Who this learning path is for Conventions Reader feedback Customer support
Downloading the example code Downloading the color images of this book Errata Piracy Questions
Module 1 Getting Started with Data Science
Problems solved using data science Understanding the data science problem -  solving approach
Using Java to support data science
Acquiring data for an application The importance and process of cleaning data Visualizing data to enhance understanding The use of statistical methods in data science Machine learning applied to data science Using neural networks in data science Deep learning approaches Performing text analysis Visual and audio analysis Improving application performance using parallel techniques Assembling the pieces Summary
Data Acquisition
Understanding the data formats used in data science applications
Overview of CSV data Overview of spreadsheets Overview of databases Overview of PDF files Overview of JSON Overview of XML Overview of streaming data Overview of audio/video/images in Java
Data acquisition techniques
Using the HttpUrlConnection class Web crawlers in Java
Creating your own web crawler Using the crawler4j web crawler
Web scraping in Java Using API calls to access common social media sites
Using OAuth to authenticate users Handing Twitter Handling Wikipedia Handling Flickr Handling YouTube
Searching by keyword
Summary
Data Cleaning
Handling data formats
Handling CSV data Handling spreadsheets
Handling Excel spreadsheets
Handling PDF files Handling JSON
Using JSON streaming API Using the JSON tree API
The nitty gritty of cleaning text
Using Java tokenizers to extract words
Java core tokenizers Third-party tokenizers and libraries
Transforming data into a usable form
Simple text cleaning Removing stop words
Finding words in text
Finding and replacing text
Data imputation Subsetting data Sorting text Data validation
Validating data types Validating dates Validating e-mail addresses Validating ZIP codes Validating names
Cleaning images
Changing the contrast of an image Smoothing an image Brightening an image Resizing an image Converting images to different formats
Summary
Data Visualization
Understanding plots and graphs
Visual analysis goals
Creating index charts Creating bar charts
Using country as the category Using decade as the category
Creating stacked graphs Creating pie charts Creating scatter charts Creating histograms Creating donut charts Creating bubble charts Summary
Statistical Data Analysis Techniques
Working with mean, mode, and median
Calculating the mean
Using simple Java techniques to find mean Using Java 8 techniques to find mean Using Google Guava to find mean Using Apache Commons to find mean
Calculating the median
Using simple Java techniques to find median Using Apache Commons to find the median
Calculating the mode
Using ArrayLists to find multiple modes Using a HashMap to find multiple modes Using a Apache Commons to find multiple modes
Standard deviation Sample size determination Hypothesis testing Regression analysis
Using simple linear regression Using multiple regression
Summary
Machine Learning
Supervised learning techniques
Decision trees
Decision tree types Decision tree libraries Using a decision tree with a book dataset Testing the book decision tree
Support vector machines
Using an SVM for camping data Testing individual instances
Bayesian networks
Using a Bayesian network
Unsupervised machine learning
Association rule learning
Using association rule learning to find buying relationships
Reinforcement learning Summary
Neural Networks
Training a neural network
Getting started with neural network architectures
Understanding static neural networks
A basic Java example
Understanding dynamic neural networks
Multilayer perceptron networks
Building the model Evaluating the model Predicting other values Saving and retrieving the model
Learning vector quantization Self-Organizing Maps
Using a SOM Displaying the SOM results
Additional network architectures and algorithms
The k-Nearest Neighbors algorithm Instantaneously trained networks Spiking neural networks Cascading neural networks Holographic associative memory Backpropagation and neural networks
Summary
Deep Learning
Deeplearning4j architecture
Acquiring and manipulating data
Reading in a CSV file
Configuring and building a model
Using hyperparameters in ND4J Instantiating the network model
Training a model Testing a model
Deep learning and regression analysis
Preparing the data Setting up the class Reading and preparing the data Building the model Evaluating the model
Restricted Boltzmann Machines
Reconstruction in an RBM Configuring an RBM
Deep autoencoders
Building an autoencoder in DL4J
Configuring the network Building and training the network Saving and retrieving a network Specialized autoencoders
Convolutional networks
Building the model Evaluating the model
Recurrent Neural Networks Summary
Text Analysis
Implementing named entity recognition
Using OpenNLP to perform NER Identifying location entities
Classifying text
Word2Vec and Doc2Vec Classifying text by labels Classifying text by similarity
Understanding tagging and POS
Using OpenNLP to identify POS Understanding POS tags
Extracting relationships from sentences
Using OpenNLP to extract relationships
Sentiment analysis
Downloading and extracting the Word2Vec model Building our model and classifying text
Summary
Visual and Audio Analysis
Text-to-speech
Using FreeTTS Getting information about voices Gathering voice information
Understanding speech recognition
Using CMUPhinx to convert speech to text Obtaining more detail about the words
Extracting text from an image
Using Tess4j to extract text
Identifying faces
Using OpenCV to detect faces
Classifying visual data
Creating a Neuroph Studio project for classifying visual images Training the model
Summary
Mathematical and Parallel Techniques for Data Analysis
Implementing basic matrix operations
Using GPUs with DeepLearning4j
Using map-reduce
Using Apache's Hadoop to perform map-reduce Writing the map method Writing the reduce method Creating and executing a new Hadoop job
Various mathematical libraries
Using the jblas API Using the Apache Commons math API Using the ND4J API
Using OpenCL Using Aparapi
Creating an Aparapi application Using Aparapi for matrix multiplication
Using Java 8 streams
Understanding Java 8 lambda expressions and streams Using Java 8 to perform matrix multiplication Using Java 8 to perform map-reduce
Summary
Bringing It All Together
Defining the purpose and scope of our application Understanding the application's architecture Data acquisition using Twitter Understanding the TweetHandler class
Extracting data for a sentiment analysis model Building the sentiment model Processing the JSON input Cleaning data to improve our results Removing stop words Performing sentiment analysis Analysing the results
Other optional enhancements Summary
Module 2 Data Science Using Java
Data science
Machine learning
Supervised learning Unsupervised learning
Clustering Dimensionality reduction
Natural Language Processing
Data science process models
CRISP-DM A running example
Data science in Java
Data science libraries
Data processing libraries Math and stats libraries Machine learning and data mining libraries Text processing
Summary
Data Processing Toolbox
Standard Java library
Collections Input/Output
Reading input data Writing ouput data
Streaming API
Extensions to the standard library
Apache Commons
Commons Lang Commons IO Commons Collections Other commons modules
Google Guava AOL Cyclops React
Accessing data
Text data and CSV Web and HTML JSON Databases DataFrames
Search engine - preparing data Summary
Exploratory Data Analysis
Exploratory data analysis in Java
Search engine datasets Apache Commons Math Joinery
Interactive Exploratory Data Analysis in Java
JVM languages
Interactive Java
Joinery shell
Summary
Supervised Learning - Classification and Regression
Classification
Binary classification models
Smile JSAT LIBSVM and LIBLINEAR Encog
Evaluation
Accuracy Precision, recall, and F1 ROC and AU ROC (AUC) Result validation K-fold cross-validation Training, validation, and testing
Case study - page prediction Regression
Machine learning libraries for regression
Smile JSAT Other libraries
Evaluation
MSE MAE
Case study - hardware performance Summary
Unsupervised Learning - Clustering and Dimensionality Reduction
Dimensionality reduction
Unsupervised dimensionality reduction Principal Component Analysis Truncated SVD Truncated SVD for categorical and sparse data
Random projection
Cluster analysis
Hierarchical methods K-means
Choosing K in K-Means DBSCAN
Clustering for supervised learning
Clusters as features Clustering as dimensionality reduction Supervised learning via clustering
Evaluation
Manual evaluation Supervised evaluation Unsupervised Evaluation
Summary
Working with Text - Natural Language Processing and Information Retrieval
Natural Language Processing and information retrieval
Vector Space Model - Bag of Words and TF-IDF
Vector space model implementation
Indexing and Apache Lucene Natural Language Processing tools
Stanford CoreNLP
Customizing Apache Lucene
Machine learning for texts
Unsupervised learning for texts
Latent Semantic Analysis Text clustering Word embeddings
Supervised learning for texts Text classification Learning to rank for information retrieval
Reranking with Lucene
Summary
Extreme Gradient Boosting
Gradient Boosting Machines and XGBoost
Installing XGBoost
XGBoost in practice
XGBoost for classification
Parameter tuning Text features Feature importance
XGBoost for regression XGBoost for learning to rank
Summary
Deep Learning with DeepLearning4J
Neural Networks and DeepLearning4J
ND4J - N-dimensional arrays for Java Neural networks in DeepLearning4J Convolutional Neural Networks
Deep learning for cats versus dogs
Reading the data Creating the model Monitoring the performance Data augmentation Running DeepLearning4J on GPU
Summary
Scaling Data Science
Apache Hadoop
Hadoop MapReduce Common Crawl
Apache Spark Link prediction
Reading the DBLP graph Extracting features from the graph Node features Negative sampling Edge features Link Prediction with MLlib and XGBoost Link suggestion
Summary
Deploying Data Science Models
Microservices
Spring Boot Search engine service
Online evaluation
A/B testing Multi-armed bandits
Summary
Bibliography
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion