Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Scala:Applied Machine Learning
Table of Contents Scala:Applied Machine Learning Scala:Applied Machine Learning Credits Preface
What this learning path covers What you need for this learning path
Module 1
Installing the JDK Installing and using SBT
Module 2 Module 3
Who this learning path is for Reader feedback Customer support
Downloading the example code Errata Piracy Questions
I. Module 1
1. Scala and Data Science
Data science Programming in data science Why Scala?
Static typing and type inference Scala encourages immutability Scala and functional programs Null pointer uncertainty Easier parallelism Interoperability with Java
When not to use Scala Summary References
2. Manipulating Data with Breeze
Code examples Installing Breeze Getting help on Breeze Basic Breeze data types
Vectors Dense and sparse vectors and the vector trait Matrices Building vectors and matrices Advanced indexing and slicing Mutating vectors and matrices Matrix multiplication, transposition, and the orientation of vectors Data preprocessing and feature engineering Breeze – function optimization Numerical derivatives Regularization
An example – logistic regression Towards re-usable code Alternatives to Breeze Summary References
3. Plotting with breeze-viz
Diving into Breeze Customizing plots Customizing the line type More advanced scatter plots Multi-plot example – scatterplot matrix plots Managing without documentation Breeze-viz reference Data visualization beyond breeze-viz Summary
4. Parallel Collections and Futures
Parallel collections
Limitations of parallel collections Error handling Setting the parallelism level An example – cross-validation with parallel collections
Futures
Future composition – using a future's result Blocking until completion Controlling parallel execution with execution contexts Futures example – stock price fetcher
Summary References
5. Scala and SQL through JDBC
Interacting with JDBC First steps with JDBC
Connecting to a database server Creating tables Inserting data Reading data
JDBC summary Functional wrappers for JDBC Safer JDBC connections with the loan pattern Enriching JDBC statements with the "pimp my library" pattern Wrapping result sets in a stream Looser coupling with type classes
Type classes Coding against type classes When to use type classes Benefits of type classes
Creating a data access layer Summary References
6. Slick – A Functional Interface for SQL
FEC data
Importing Slick Defining the schema Connecting to the database Creating tables Inserting data Querying data
Invokers Operations on columns Aggregations with "Group by" Accessing database metadata Slick versus JDBC Summary References
7. Web APIs
A whirlwind tour of JSON Querying web APIs JSON in Scala – an exercise in pattern matching
JSON4S types Extracting fields using XPath
Extraction using case classes Concurrency and exception handling with futures Authentication – adding HTTP headers
HTTP – a whirlwind overview Adding headers to HTTP requests in Scala
Summary References
8. Scala and MongoDB
MongoDB Connecting to MongoDB with Casbah
Connecting with authentication
Inserting documents Extracting objects from the database Complex queries Casbah query DSL Custom type serialization Beyond Casbah Summary References
9. Concurrency with Akka
GitHub follower graph Actors as people Hello world with Akka Case classes as messages Actor construction Anatomy of an actor Follower network crawler Fetcher actors Routing Message passing between actors Queue control and the pull pattern Accessing the sender of a message Stateful actors Follower network crawler Fault tolerance Custom supervisor strategies Life-cycle hooks What we have not talked about Summary References
10. Distributed Batch Processing with Spark
Installing Spark Acquiring the example data Resilient distributed datasets
RDDs are immutable RDDs are lazy RDDs know their lineage RDDs are resilient RDDs are distributed Transformations and actions on RDDs Persisting RDDs Key-value RDDs Double RDDs
Building and running standalone programs
Running Spark applications locally Reducing logging output and Spark configuration Running Spark applications on EC2
Spam filtering Lifting the hood Data shuffling and partitions Summary Reference
11. Spark SQL and DataFrames
DataFrames – a whirlwind introduction Aggregation operations Joining DataFrames together Custom functions on DataFrames DataFrame immutability and persistence SQL statements on DataFrames Complex data types – arrays, maps, and structs
Structs Arrays Maps
Interacting with data sources
JSON files Parquet files
Standalone programs Summary References
12. Distributed Machine Learning with MLlib
Introducing MLlib – Spam classification Pipeline components
Transformers Estimators
Evaluation Regularization in logistic regression Cross-validation and model selection Beyond logistic regression Summary References
13. Web APIs with Play
Client-server applications Introduction to web frameworks Model-View-Controller architecture Single page applications Building an application The Play framework Dynamic routing Actions
Composing the response Understanding and parsing the request
Interacting with JSON Querying external APIs and consuming JSON
Calling external web services Parsing JSON Asynchronous actions
Creating APIs with Play: a summary Rest APIs: best practice Summary References
14. Visualization with D3 and the Play Framework
GitHub user data Do I need a backend? JavaScript dependencies through web-jars Towards a web application: HTML templates Modular JavaScript through RequireJS Bootstrapping the applications Client-side program architecture
Designing the model The event bus AJAX calls through JQuery Response views
Drawing plots with NVD3 Summary References
A. Pattern Matching and Extractors
Pattern matching in for comprehensions Pattern matching internals Extracting sequences Summary Reference
II. Module 2
1. Getting Started
Mathematical notation for the curious Why machine learning?
Classification Prediction Optimization Regression
Why Scala?
Abstraction
Higher-kind projection Covariant functors for vectors Contravariant functors for co-vectors Monads
Scalability Configurability Maintainability Computation on demand
Model categorization Taxonomy of machine learning algorithms
Unsupervised learning
Clustering Dimension reduction
Supervised learning
Generative models Discriminative models
Semi-supervised learning Reinforcement learning
Don't reinvent the wheel! Tools and frameworks
Java Scala Apache Commons Math
Description Licensing Installation
JFreeChart
Description Licensing Installation
Other libraries and frameworks
Source code
Context versus view bounds Presentation Primitives and implicits
Primitive types Type conversions
Immutability Performance of Scala iterators
Let's kick the tires
An overview of computational workflows Writing a simple workflow
Step 1 – scoping the problem Step 2 – loading data Step 3 – preprocessing the data
Immutable normalization
Step 4 – discovering patterns
Analyzing data Plotting data
Step 5 – implementing the classifier
Selecting an optimizer Training the model Classifying observations
Step 6 – evaluating the model
Summary
2. Hello World!
Modeling
A model by any other name Model versus design Selecting features Extracting features
Defining a methodology Monadic data transformation
Error handling Explicit models Implicit models
A workflow computational model
Supporting mathematical abstractions
Step 1 – variable declaration Step 2 – model definition Step 3 – instantiation
Composing mixins to build a workflow
Understanding the problem Defining modules Instantiating the workflow
Modularization
Profiling data
Immutable statistics Z-Score and Gauss
Assessing a model
Validation
Key quality metrics F-score for binomial classification F-score for multinomial classification
Cross-validation
One-fold cross validation K-fold cross validation
Bias-variance decomposition Overfitting
Summary
3. Data Preprocessing
Time series in Scala
Types and operations The magnet pattern
The transpose operator The differential operator
Lazy views
Moving averages
The simple moving average The weighted moving average The exponential moving average
Fourier analysis
Discrete Fourier transform DFT-based filtering Detection of market cycles
The discrete Kalman filter
The state space estimation
The transition equation The measurement equation
The recursive algorithm
Prediction Correction Kalman smoothing Fixed lag smoothing Experimentation Benefits and drawbacks
Alternative preprocessing techniques Summary
4. Unsupervised Learning
Clustering
K-means clustering
Measuring similarity Defining the algorithm Step 1 – cluster configuration
Defining clusters Initializing clusters
Step 2 – cluster assignment Step 3 – reconstruction/error minimization
Creating K-means components Tail recursive implementation Iterative implementation
Step 4 – classification The curse of dimensionality Setting up the evaluation Evaluating the results Tuning the number of clusters Validation
The expectation-maximization algorithm
Gaussian mixture models Overview of EM Implementation Classification Testing The online EM algorithm
Dimension reduction
Principal components analysis
Algorithm Implementation Test case Evaluation
Non-linear models
Kernel PCA Manifolds
Performance considerations
K-means EM PCA
Summary
5. Naïve Bayes Classifiers
Probabilistic graphical models Naïve Bayes classifiers
Introducing the multinomial Naïve Bayes
Formalism The frequentist perspective The predictive model The zero-frequency problem
Implementation
Design Training
Class likelihood Binomial model The multinomial model Classifier components
Classification F1 validation Feature extraction Testing
The Multivariate Bernoulli classification
Model Implementation
Naïve Bayes and text mining
Basics of information retrieval Implementation
Analyzing documents Extracting the frequency of relative terms Generating the features
Testing
Retrieving the textual information Evaluating the text mining classifier
Pros and cons Summary
6. Regression and Regularization
Linear regression
One-variate linear regression
Implementation Test case
Ordinary least squares regression
Design Implementation Test case 1 – trending Test case 2 – feature selection
Regularization
Ln roughness penalty Ridge regression
Design Implementation Test case
Numerical optimization Logistic regression
Logistic function Binomial classification Design The training workflow
Step 1 – configuring the optimizer Step 2 – computing the Jacobian matrix Step 3 – managing the convergence of the optimizer Step 4 – defining the least squares problem Step 5 – minimizing the sum of square errors Test
Classification
Summary
7. Sequential Data Models
Markov decision processes
The Markov property The first order discrete Markov chain
The hidden Markov model
Notations The lambda model Design Evaluation – CF-1
Alpha – the forward pass Beta – the backward pass
Training – CF-2
The Baum-Welch estimator (EM)
Decoding – CF-3
The Viterbi algorithm
Putting it all together Test case 1 – training Test case 2 – evaluation HMM as a filtering technique
Conditional random fields
Introduction to CRF Linear chain CRF
Regularized CRFs and text analytics
The feature functions model Design Implementation
Configuring the CRF classifier Training the CRF model Applying the CRF model
Tests
The training convergence profile Impact of the size of the training set Impact of the L2 regularization factor
Comparing CRF and HMM Performance consideration Summary
8. Kernel Models and Support Vector Machines
Kernel functions
An overview Common discriminative kernels Kernel monadic composition
Support vector machines
The linear SVM
The separable case – the hard margin The nonseparable case – the soft margin
The nonlinear SVM
Max-margin classification The kernel trick
Support vector classifiers – SVC
The binary SVC
LIBSVM Design Configuration parameters
The SVM formulation The SVM kernel function The SVM execution
Interface to LIBSVM Training Classification C-penalty and margin Kernel evaluation Applications in risk analysis
Anomaly detection with one-class SVC Support vector regression
An overview SVR versus linear regression
Performance considerations Summary
9. Artificial Neural Networks
Feed-forward neural networks
The biological background Mathematical background
The multilayer perceptron
The activation function The network topology Design Configuration Network components
The network topology Input and hidden layers The output layer Synapses Connections The initialization weights
The model Problem types (modes) Online training versus batch training The training epoch
Step 1 – input forward propagation
The computational flow Error functions Operating modes Softmax
Step 2 – error backpropagation
Weights' adjustment The error propagation The computational model
Step 3 – exit condition Putting it all together
Training and classification
Regularization The model generation The Fast Fisher-Yates shuffle Prediction Model fitness
Evaluation
The execution profile Impact of the learning rate The impact of the momentum factor The impact of the number of hidden layers Test case
Implementation Evaluation of models Impact of the hidden layers' architecture
Convolution neural networks
Local receptive fields Sharing of weights Convolution layers Subsampling layers Putting it all together
Benefits and limitations Summary
10. Genetic Algorithms
Evolution
The origin NP problems Evolutionary computing
Genetic algorithms and machine learning Genetic algorithm components
Encoding
Value encoding Predicate encoding Solution encoding The encoding scheme
Flat encoding Hierarchical encoding
Genetic operators
Selection Crossover Mutation
The fitness score
Implementation
Software design Key components
Population Chromosomes Genes
Selection Controlling the population growth The GA configuration Crossover
Population Chromosomes Genes
Mutation
Population Chromosomes Genes
Reproduction Solver
GA for trading strategies
Definition of trading strategies
Trading operators The cost function Trading signals Trading strategies Trading signal encoding
A test case
Creating trading strategies Configuring the optimizer Finding the best trading strategy Tests
The weighted score The unweighted score
Advantages and risks of genetic algorithms Summary
11. Reinforcement Learning
Reinforcement learning
The problem A solution – Q-learning
Terminology Concepts Value of a policy The Bellman optimality equations Temporal difference for model-free learning Action-value iterative update
Implementation
Software design The states and actions The search space The policy and action-value The Q-learning components The Q-learning training Tail recursion to the rescue The validation The prediction
Option trading using Q-learning
The OptionProperty class The OptionModel class Quantization
Putting it all together Evaluation Pros and cons of reinforcement learning
Learning classifier systems
Introduction to LCS Why LCS? Terminology Extended learning classifier systems XCS components
Application to portfolio management The XCS core data XCS rules Covering An implementation example
Benefits and limitations of learning classifier systems
Summary
12. Scalable Frameworks
An overview Scala
Object creation Streams Parallel collections
Processing a parallel collection The benchmark framework Performance evaluation
Scalability with Actors
The Actor model Partitioning Beyond actors – reactive programming
Akka
Master-workers
Exchange of messages Worker actors The workflow controller The master actor Master with routing Distributed discrete Fourier transform Limitations
Futures
The Actor life cycle Blocking on futures Handling future callbacks Putting it all together
Apache Spark
Why Spark? Design principles
In-memory persistency Laziness Transforms and actions Shared variables
Experimenting with Spark
Deploying Spark Using Spark shell MLlib RDD generation K-means using Spark
Performance evaluation
Tuning parameters Tests Performance considerations
Pros and cons 0xdata Sparkling Water
Summary
A. Basic Concepts
Scala programming
List of libraries and tools Code snippets format Best practices
Encapsulation Class constructor template Companion objects versus case classes Enumerations versus case classes Overloading Design template for immutable classifiers
Utility classes
Data extraction Data sources Extraction of documents DMatrix class Counter Monitor
Mathematics
Linear algebra
QR decomposition LU factorization LDL decomposition Cholesky factorization Singular Value Decomposition Eigenvalue decomposition Algebraic and numerical libraries
First order predicate logic Jacobian and Hessian matrices Summary of optimization techniques
Gradient descent methods
Steepest descent Conjugate gradient Stochastic gradient descent
Quasi-Newton algorithms
BFGS L-BFGS
Nonlinear least squares minimization
Gauss-Newton Levenberg-Marquardt
Lagrange multipliers
Overview of dynamic programming
Finances 101
Fundamental analysis Technical analysis
Terminology Trading data Trading signals and strategy Price patterns
Options trading Financial data sources
Suggested online courses References
III. Module 3
1. Exploratory Data Analysis
Getting started with Scala Distinct values of a categorical field Summarization of a numeric field
Grepping across multiple fields
Basic, stratified, and consistent sampling Working with Scala and Spark Notebooks Basic correlations Summary
2. Data Pipelines and Modeling
Influence diagrams Sequential trials and dealing with risk Exploration and exploitation Unknown unknowns Basic components of a data-driven system
Data ingest Data transformation layer Data analytics and machine learning UI component Actions engine Correlation engine Monitoring
Optimization and interactivity
Feedback loops
Summary
3. Working with Spark and MLlib
Setting up Spark Understanding Spark architecture
Task scheduling Spark components MQTT, ZeroMQ, Flume, and Kafka HDFS, Cassandra, S3, and Tachyon Mesos, YARN, and Standalone
Applications
Word count Streaming word count Spark SQL and DataFrame
ML libraries
SparkR Graph algorithms – GraphX and GraphFrames
Spark performance tuning Running Hadoop HDFS Summary
4. Supervised and Unsupervised Learning
Records and supervised learning
Iris dataset Labeled point SVMWithSGD Logistic regression Decision tree Bagging and boosting – ensemble learning methods
Unsupervised learning Problem dimensionality Summary
5. Regression and Classification
What regression stands for? Continuous space and metrics Linear regression Logistic regression Regularization Multivariate regression Heteroscedasticity Regression trees Classification metrics Multiclass problems Perceptron Generalization error and overfitting Summary
6. Working with Unstructured Data
Nested data Other serialization formats Hive and Impala Sessionization Working with traits Working with pattern matching Other uses of unstructured data Probabilistic structures Projections Summary
7. Working with Graph Algorithms
A quick introduction to graphs SBT Graph for Scala
Adding nodes and edges Graph constraints JSON
GraphX
Who is getting e-mails? Connected components Triangle counting Strongly connected components PageRank SVD++
Summary
8. Integrating Scala with R and Python
Integrating with R
Setting up R and SparkR
Linux Mac OS Windows Running SparkR via scripts Running Spark via R's command line
DataFrames Linear models Generalized linear model Reading JSON files in SparkR Writing Parquet files in SparkR Invoking Scala from R
Using Rserve
Integrating with Python
Setting up Python PySpark Calling Python from Java/Scala
Using sys.process._ Spark pipe Jython and JSR 223
Summary
9. NLP in Scala
Text analysis pipeline
Simple text analysis
MLlib algorithms in Spark
TF-IDF LDA
Segmentation, annotation, and chunking POS tagging Using word2vec to find word relationships
A Porter Stemmer implementation of the code
Summary
10. Advanced Model Monitoring
System monitoring Process monitoring Model monitoring
Performance over time Criteria for model retiring A/B testing
Summary
A. Bibliography Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion