Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
Gathering and Organizing Data
Handling data - Gopher style
Best practices for gathering and organizing data with Go
CSV files
Reading in CSV data from a file
Handling unexpected fields
Handling unexpected types
Manipulating CSV data with data frames
JSON
Parsing JSON
JSON output
SQL-like databases
Connecting to an SQL database
Querying the database
Modifying the database
Caching
Caching data in memory
Caching data locally on disk
Data versioning
Pachyderm jargon
Deploying/installing Pachyderm
Creating data repositories for data versioning
Putting data into data repositories
Getting data out of versioned data repositories
References
Summary
Matrices, Probability, and Statistics
Matrices and vectors
Vectors
Vector operations
Matrices
Matrix operations
Statistics
Distributions
Statistical measures
Measures of central tendency
Measures of spread or dispersion
Visualizing distributions
Histograms
Box plots
Probability
Random variables
Probability measures
Independent and conditional probability
Hypothesis testing
Test statistics
Calculating p-values
References
Summary
Evaluation and Validation
Evaluation
Continuous metrics
Categorical metrics
Individual evaluation metrics for categorical variables
Confusion matrices, AUC, and ROC
Validation
Training and test sets
Holdout set
Cross validation
References
Summary
Regression
Understanding regression model jargon
Linear regression
Overview of linear regression
Linear regression assumptions and pitfalls
Linear regression example
Profiling the data
Choosing our independent variable
Creating our training and test sets
Training our model
Evaluating the trained model
Multiple linear regression
Nonlinear and other types of regression
References
Summary
Classification
Understanding classification model jargon
Logistic regression
Overview of logistic regression
Logistic regression assumptions and pitfalls
Logistic regression example
Cleaning and profiling the data
Creating our training and test sets
Training and testing the logistic regression model
k-nearest neighbors
Overview of kNN
kNN assumptions and pitfalls
kNN example
Decision trees and random forests
Overview of decision trees and random forests
Decision tree and random forest assumptions and pitfalls
Decision tree example
Random forest example
Naive bayes
Overview of naive bayes and its big assumption
Naive bayes example
References
Summary
Clustering
Understanding clustering model jargon
Measuring Distance or Similarity
Evaluating clustering techniques
Internal clustering evaluation
External clustering evaluation
k-means clustering
Overview of k-means clustering
k-means assumptions and pitfalls
k-means clustering example
Profiling the data
Generating clusters with k-means
Evaluating the generated clusters
Other clustering techniques
References
Summary
Time Series and Anomaly Detection
Representing time series data in Go
Understanding time series jargon
Statistics related to time series
Autocorrelation
Partial autocorrelation
Auto-regressive models for forecasting
Auto-regressive model overview
Auto-regressive model assumptions and pitfalls
Auto-regressive model example
Transforming to a stationary series
Analyzing the ACF and choosing an AR order
Fitting and evaluating an AR(2) model
Auto-regressive moving averages and other time series models
Anomaly detection
References
Summary
Neural Networks and Deep Learning
Understanding neural net jargon
Building a simple neural network
Nodes in the network
Network architecture
Why do we expect this architecture to work?
Training our neural network
Utilizing the simple neural network
Training the neural network on real data
Evaluating the neural network
Introducing deep learning
What is a deep learning model?
Deep learning with Go
Setting up TensorFlow for use with Go
Retrieving and calling a pretrained TensorFlow model
Object detection using TensorFlow from Go
References
Summary
Deploying and Distributing Analyses and Models
Running models reliably on remote machines
A brief introduction to Docker and Docker jargon
Docker-izing a machine learning application
Docker-izing the model training and export
Docker-izing model predictions
Testing the Docker images locally
Running the Docker images on remote machines
Building a scalable and reproducible machine learning pipeline
Setting up a Pachyderm and Kubernetes cluster
Building a Pachyderm machine learning pipeline
Creating and filling the input repositories
Creating and running the processing stages
Updating pipelines and examining provenance
Scaling pipeline stages
References
Summary
Algorithms/Techniques Related to Machine Learning
Gradient descent
Entropy, information gain, and related methods
Backpropagation
← Prev
Back
Next →
← Prev
Back
Next →