Hands-On Machine Learning for Algorithmic Trading by Jansen, Stefan -- Read -- Imperial Library of Trantor

Index

Title Page Copyright and Credits

Hands-On Machine Learning for Algorithmic Trading

Humble Bundle About Packt

Why subscribe? Packt.com

Contributors

About the author About the reviewers Packt is searching for authors like you

Preface

Who this book is for What this book covers To get the most out of this book

Download the example code files Download the color images Conventions used

Get in touch

Reviews

Machine Learning for Trading

How to read this book

What to expect Who should read this book How the book is organized

Part I – the framework – from data to strategy design Part 2 – ML fundamentals Part 3 – natural language processing Part 4 – deep and reinforcement learning

What you need to succeed

Data sources GitHub repository Python libraries

The rise of ML in the investment industry

From electronic to high-frequency trading Factor investing and smart beta funds Algorithmic pioneers outperform humans at scale

ML driven funds attract $1 trillion AUM The emergence of quantamental funds Investments in strategic capabilities

ML and alternative data

Crowdsourcing of trading algorithms

Design and execution of a trading strategy

Sourcing and managing data Alpha factor research and evaluation Portfolio optimization and risk management Strategy backtesting

ML and algorithmic trading strategies

Use Cases of ML for Trading

Data mining for feature extraction Supervised learning for alpha factor creation and aggregation Asset allocation Testing trade ideas Reinforcement learning

Summary

Market and Fundamental Data

How to work with market data

Market microstructure

Marketplaces Types of orders

Working with order book data

The FIX protocol Nasdaq TotalView-ITCH Order Book data

Parsing binary ITCH messages Reconstructing trades and the order book

Regularizing tick data

Tick bars Time bars Volume bars Dollar bars

API access to market data

Remote data access using pandas

Reading html tables pandas-datareader for market data The Investor Exchange

Quantopian Zipline Quandl Other market-data providers

How to work with fundamental data

Financial statement data

Automated processing – XBRL Building a fundamental data time series

Extracting the financial statements and notes dataset Retrieving all quarterly Apple filings Building a price/earnings time series

Other fundamental data sources

pandas_datareader – macro and industry data

Efficient data storage with pandas Summary

Alternative Data for Finance

The alternative data revolution

Sources of alternative data

Individuals Business processes Sensors

Satellites Geolocation data

Evaluating alternative datasets

Evaluation criteria

Quality of the signal content

Asset classes Investment style Risk premiums Alpha content and quality

Quality of the data

Legal and reputational risks Exclusivity Time horizon Frequency Reliability

Technical aspects

Latency Format

The market for alternative data

Data providers and use cases

Social sentiment data

Dataminr StockTwits RavenPack

Satellite data Geolocation data Email receipt data

Working with alternative data

Scraping OpenTable data

Extracting data from HTML using requests and BeautifulSoup Introducing Selenium – using browser automation Building a dataset of restaurant bookings One step further – Scrapy and splash

Earnings call transcripts

Parsing HTML using regular expressions

Summary

Alpha Factor Research

Engineering alpha factors

Important factor categories

Momentum and sentiment factors

Rationale Key metrics

Value factors

Rationale Key metrics

Volatility and size factors

Rationale Key metrics

Quality factors

Rationale Key metrics

How to transform data into factors

Useful pandas and NumPy methods

Loading the data Resampling from daily to monthly frequency Computing momentum factors Using lagged returns and different holding periods Compute factor betas

Built-in Quantopian factors TA-Lib

Seeking signals – how to use zipline

The architecture – event-driven trading simulation A single alpha factor from market data Combining factors from diverse data sources

Separating signal and noise – how to use alphalens

Creating forward returns and factor quantiles Predictive performance by factor quantiles The information coefficient Factor turnover

Alpha factor resources

Alternative algorithmic trading libraries

Summary

Strategy Evaluation

How to build and test a portfolio with zipline

Scheduled trading and portfolio rebalancing

How to measure performance with pyfolio

The Sharpe ratio The fundamental law of active management In and out-of-sample performance with pyfolio

Getting pyfolio input from alphalens Getting pyfolio input from a zipline backtest Walk-forward testing out-of-sample returns Summary performance statistics Drawdown periods and factor exposure Modeling event risk

How to avoid the pitfalls of backtesting

Data challenges

Look-ahead bias Survivorship bias Outlier control Unrepresentative period

Implementation issues

Mark-to-market performance Trading costs Timing of trades

Data-snooping and backtest-overfitting

The minimum backtest length and the deflated SR Optimal stopping for backtests

How to manage portfolio risk and return

Mean-variance optimization

How it works The efficient frontier in Python Challenges and shortcomings

Alternatives to mean-variance optimization

The 1/n portfolio The minimum-variance portfolio Global Portfolio Optimization - The Black-Litterman approach How to size your bets – the Kelly rule

The optimal size of a bet Optimal investment – single asset Optimal investment – multiple assets

Risk parity Risk factor investment Hierarchical risk parity

Summary

The Machine Learning Process

Learning from data

Supervised learning Unsupervised learning

Applications Cluster algorithms Dimensionality reduction

Reinforcement learning

The machine learning workflow

Basic walkthrough – k-nearest neighbors Frame the problem – goals and metrics

Prediction versus inference

Causal inference

Regression problems Classification problems

Receiver operating characteristics and the area under the curve Precision-recall curves

Collecting and preparing the data Explore, extract, and engineer features

Using information theory to evaluate features

Selecting an ML algorithm Design and tune the model

The bias-variance trade-off Underfitting versus overfitting Managing the trade-off Learning curves

How to use cross-validation for model selection

How to implement cross-validation in Python

Basic train-test split

Cross-validation

Using a hold-out test set KFold iterator Leave-one-out CV Leave-P-Out CV ShuffleSplit

Parameter tuning with scikit-learn

Validation curves with yellowbricks Learning curves Parameter tuning using GridSearchCV and pipeline

Challenges with cross-validation in finance

Time series cross-validation with sklearn Purging, embargoing, and combinatorial CV

Summary

Linear Models

Linear regression for inference and prediction The multiple linear regression model

How to formulate the model How to train the model

Least squares Maximum likelihood estimation Gradient descent

The Gauss—Markov theorem How to conduct statistical inference How to diagnose and remedy problems

Goodness of fit Heteroskedasticity Serial correlation Multicollinearity

How to run linear regression in practice

OLS with statsmodels Stochastic gradient descent with sklearn

How to build a linear factor model

From the CAPM to the Fama—French five-factor model Obtaining the risk factors Fama—Macbeth regression

Shrinkage methods: regularization for linear regression

How to hedge against overfitting How ridge regression works How lasso regression works

How to use linear regression to predict returns

Prepare the data

Universe creation and time horizon Target return computation Alpha factor selection and transformation Data cleaning – missing data Data exploration Dummy encoding of categorical variables Creating forward returns

Linear OLS regression using statsmodels

Diagnostic statistics

Linear OLS regression using sklearn

Custom time series cross-validation Select features and target Cross-validating the model Test results – information coefficient and RMSE

Ridge regression using sklearn

Tuning the regularization parameters using cross-validation Cross-validation results and ridge coefficient paths Top 10 coefficients

Lasso regression using sklearn

Cross-validated information coefficient and Lasso Path

Linear classification

The logistic regression model

Objective function The logistic function Maximum likelihood estimation

How to conduct inference with statsmodels How to use logistic regression for prediction

How to predict price movements using sklearn

Summary

Time Series Models

Analytical tools for diagnostics and feature extraction

How to decompose time series patterns How to compute rolling window statistics

Moving averages and exponential smoothing

How to measure autocorrelation How to diagnose and achieve stationarity

Time series transformations How to diagnose and address unit roots Unit root tests

How to apply time series transformations

Univariate time series models

How to build autoregressive models

How to identify the number of lags How to diagnose model fit

How to build moving average models

How to identify the number of lags The relationship between AR and MA models

How to build ARIMA models and extensions

How to identify the number of AR and MA terms Adding features – ARMAX Adding seasonal differencing – SARIMAX

How to forecast macro fundamentals How to use time series models to forecast volatility

The autoregressive conditional heteroskedasticity (ARCH) model Generalizing ARCH – the GARCH model

Selecting the lag order

How to build a volatility-forecasting model

Multivariate time series models

Systems of equations The vector autoregressive (VAR) model How to use the VAR model for macro fundamentals forecasts Cointegration – time series with a common trend

Testing for cointegration

How to use cointegration for a pairs-trading strategy

Summary

Bayesian Machine Learning

How Bayesian machine learning works

How to update assumptions from empirical evidence Exact inference: Maximum a Posteriori estimation

How to select priors How to keep inference simple – conjugate priors How to dynamically estimate the probabilities of asset price moves

Approximate inference: stochastic versus deterministic approaches

Sampling-based stochastic inference Markov chain Monte Carlo sampling

Gibbs sampling Metropolis-Hastings sampling Hamiltonian Monte Carlo – going NUTS

Variational Inference

Automatic Differentiation Variational Inference (ADVI)

Probabilistic programming with PyMC3

Bayesian machine learning with Theano The PyMC3 workflow

Model definition – Bayesian logistic regression

Visualization and plate notation The Generalized Linear Models module MAP inference

Approximate inference – MCMC

Credible intervals

Approximate inference – variational Bayes Model diagnostics

Convergence Posterior Predictive Checks

Prediction

Practical applications

Bayesian Sharpe ratio and performance comparison

Model definition Performance comparison

Bayesian time series models Stochastic volatility models

Summary

Decision Trees and Random Forests

Decision trees

How trees learn and apply decision rules How to use decision trees in practice

How to prepare the data How to code a custom cross-validation class How to build a regression tree How to build a classification tree

How to optimize for node purity How to train a classification tree

How to visualize a decision tree How to evaluate decision tree predictions Feature importance

Overfitting and regularization

How to regularize a decision tree Decision tree pruning

How to tune the hyperparameters

GridsearchCV for decision trees How to inspect the tree structure Learning curves

Strengths and weaknesses of decision trees

Random forests

Ensemble models How bagging lowers model variance

Bagged decision trees

How to build a random forest How to train and tune a random forest

Feature importance for random forests Out-of-bag testing

Pros and cons of random forests

Summary

Gradient Boosting Machines

Adaptive boosting

The AdaBoost algorithm AdaBoost with sklearn

Gradient boosting machines

How to train and tune GBM models

Ensemble size and early stopping Shrinkage and learning rate Subsampling and stochastic gradient boosting

How to use gradient boosting with sklearn

How to tune parameters with GridSearchCV Parameter impact on test scores How to test on the holdout set

Fast scalable GBM implementations

How algorithmic innovations drive performance

Second-order loss function approximation Simplified split-finding algorithms Depth-wise versus leaf-wise growth GPU-based training DART – dropout for trees Treatment of categorical features Additional features and optimizations

How to use XGBoost, LightGBM, and CatBoost

How to create binary data formats How to tune hyperparameters

Objectives and loss functions Learning parameters Regularization Randomized grid search

How to evaluate the results

Cross-validation results across models

How to interpret GBM results

Feature importance Partial dependence plots SHapley Additive exPlanations

How to summarize SHAP values by feature How to use force plots to explain a prediction How to analyze feature interaction

Summary

Unsupervised Learning

Dimensionality reduction

Linear and non-linear algorithms The curse of dimensionality Linear dimensionality reduction

Principal Component Analysis

Visualizing PCA in 2D The assumptions made by PCA How the PCA algorithm works PCA based on the covariance matrix PCA using Singular Value Decomposition PCA with sklearn

Independent Component Analysis

ICA assumptions The ICA algorithm ICA with sklearn

PCA for algorithmic trading

Data-driven risk factors Eigen portfolios

Manifold learning

t-SNE UMAP

Clustering

k-Means clustering

Evaluating cluster quality Hierarchical clustering

Visualization – dendrograms

Density-based clustering

DBSCAN Hierarchical DBSCAN

Gaussian mixture models

The expectation-maximization algorithm

Hierarchical risk parity

Summary

Working with Text Data

How to extract features from text data

Challenges of NLP The NLP workflow

Parsing and tokenizing text data Linguistic annotation Semantic annotation Labeling

Use cases

From text to tokens – the NLP pipeline

NLP pipeline with spaCy and textacy

Parsing, tokenizing, and annotating a sentence Batch-processing documents Sentence boundary detection Named entity recognition N-grams spaCy's streaming API Multi-language NLP

NLP with TextBlob

Stemming Sentiment polarity and subjectivity

From tokens to numbers – the document-term matrix

The BoW model

Measuring the similarity of documents

Document-term matrix with sklearn

Using CountVectorizer

Visualizing vocabulary distribution Finding the most similar documents

TfidFTransformer and TfidFVectorizer

The effect of smoothing How to summarize news articles using TfidFVectorizer

Text Preprocessing - review

Text classification and sentiment analysis

The Naive Bayes classifier

Bayes' theorem refresher The conditional independence assumption

News article classification

Training and evaluating multinomial Naive Bayes classifier

Sentiment analysis

Twitter data

Multinomial Naive Bayes Comparison with TextBlob sentiment scores

Business reviews – the Yelp dataset challenge

Benchmark accuracy Multinomial Naive Bayes model One-versus-all logistic regression Combining text and numerical features Multinomial logistic regression Gradient-boosting machine

Summary

Topic Modeling

Learning latent topics: goals and approaches

From linear algebra to hierarchical probabilistic models

Latent semantic indexing

How to implement LSI using sklearn Pros and cons

Probabilistic latent semantic analysis

How to implement pLSA using sklearn

Latent Dirichlet allocation

How LDA works

The Dirichlet distribution The generative model Reverse-engineering the process

How to evaluate LDA topics

Perplexity Topic coherence

How to implement LDA using sklearn How to visualize LDA results using pyLDAvis How to implement LDA using gensim Topic modeling for earnings calls

Data preprocessing Model training and evaluation Running experiments

Topic modeling for Yelp business reviews

Summary

Word Embeddings

How word embeddings encode semantics

How neural language models learn usage in context The Word2vec model – learn embeddings at scale

Model objective – simplifying the softmax Automatic phrase detection

How to evaluate embeddings – vector arithmetic and analogies How to use pre-trained word vectors

GloVe – global vectors for word representation

How to train your own word vector embeddings The Skip-Gram architecture in Keras

Noise-contrastive estimation The model components Visualizing embeddings using TensorBoard

Word vectors from SEC filings using gensim

Preprocessing

Automatic phrase detection

Model training

Model evaluation Performance impact of parameter settings

Sentiment analysis with Doc2vec

Training Doc2vec on yelp sentiment data

Create input data

Bonus – Word2vec for translation Summary

Next Steps

Key takeaways and lessons learned

Data is the single most important ingredient

Quality control Data integration

Domain expertise helps unlock value in data

Feature engineering and alpha factor research

ML is a toolkit for solving problems with data Model diagnostics help speed up optimization

Making do without a free lunch Managing the bias-variance trade-off Define targeted model objectives The optimization verification test

Beware of backtest overfitting How to gain insights from black-box models

ML for trading in practice

Data management technologies

Database systems Big Data technologies – Hadoop and Spark

ML tools Online trading platforms

Quantopian QuantConnect QuantRocket

Conclusion

Other Books You May Enjoy

Leave a review - let other readers know what you think

← Prev
Back
Next →

← Prev
Back
Next →