Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Cover
Half Title
Title Page
Copyright Page
Dedication
Table of Contents
Preface
1 Introduction
1.1 Rise of Big Data and Dimensionality
1.1.1 Biological sciences
1.1.2 Health sciences
1.1.3 Computer and information sciences
1.1.4 Economics and finance
1.1.5 Business and program evaluation
1.1.6 Earth sciences and astronomy
1.2 Impact of Big Data
1.3 Impact of Dimensionality
1.3.1 Computation
1.3.2 Noise accumulation
1.3.3 Spurious correlation
1.3.4 Statistical theory
1.4 Aim of High-dimensional Statistical Learning
1.5 What Big Data Can Do
1.6 Scope of the Book
2 Multiple and Nonparametric Regression
2.1 Introduction
2.2 Multiple Linear Regression
2.2.1 The Gauss-Markov theorem
2.2.2 Statistical tests
2.3 Weighted Least-Squares
2.4 Box-Cox Transformation
2.5 Model Building and Basis Expansions
2.5.1 Polynomial regression
2.5.2 Spline regression
2.5.3 Multiple covariates
2.6 Ridge Regression
2.6.1 Bias-variance tradeoff
2.6.2 ℓ2 penalized least squares
2.6.3 Bayesian interpretation
2.6.4 Ridge regression solution path
2.6.5 Kernel ridge regression
2.7 Regression in Reproducing Kernel Hilbert Space
2.8 Leave-one-out and Generalized Cross-validation
2.9 Exercises
3 Introduction to Penalized Least-Squares
3.1 Classical Variable Selection Criteria
3.1.1 Subset selection
3.1.2 Relation with penalized regression
3.1.3 Selection of regularization parameters
3.2 Folded-concave Penalized Least Squares
3.2.1 Orthonormal designs
3.2.2 Penalty functions
3.2.3 Thresholding by SCAD and MCP
3.2.4 Risk properties
3.2.5 Characterization of folded-concave PLS
3.3 Lasso and L1 Regularization
3.3.1 Nonnegative garrote
3.3.2 Lasso
3.3.3 Adaptive Lasso
3.3.4 Elastic Net
3.3.5 Dantzig selector
3.3.6 SLOPE and sorted penalties
3.3.7 Concentration inequalities and uniform convergence
3.3.8 A brief history of model selection
3.4 Bayesian Variable Selection
3.4.1 Bayesian view of the PLS
3.4.2 A Bayesian framework for selection
3.5 Numerical Algorithms
3.5.1 Quadratic programs
3.5.2 Least angle regression*
3.5.3 Local quadratic approximations
3.5.4 Local linear algorithm
3.5.5 Penalized linear unbiased selection*
3.5.6 Cyclic coordinate descent algorithms
3.5.7 Iterative shrinkage-thresholding algorithms
3.5.8 Projected proximal gradient method
3.5.9 ADMM
3.5.10 Iterative local adaptive majorization and minimization
3.5.11 Other methods and timeline
3.6 Regularization Parameters for PLS
3.6.1 Degrees of freedom
3.6.2 Extension of information criteria
3.6.3 Application to PLS estimators
3.7 Residual Variance and Refitted Cross-validation
3.7.1 Residual variance of Lasso
3.7.2 Refitted cross-validation
3.8 Extensions to Nonparametric Modeling
3.8.1 Structured nonparametric models
3.8.2 Group penalty
3.9 Applications
3.10 Bibliographical Notes
3.11 Exercises
4 Penalized Least Squares: Properties
4.1 Performance Benchmarks
4.1.1 Performance measures
4.1.2 Impact of model uncertainty
4.1.2.1 Bayes lower bounds for orthogonal design
4.1.2.2 Minimax lower bounds for general design
4.1.3 Performance goals, sparsity and sub-Gaussian noise
4.2 Penalized L0 Selection
4.3 Lasso and Dantzig Selector
4.3.1 Selection consistency
4.3.2 Prediction and coefficient estimation errors
4.3.3 Model size and least squares after selection
4.3.4 Properties of the Dantzig selector
4.3.5 Regularity conditions on the design matrix
4.4 Properties of Concave PLS
4.4.1 Properties of penalty functions
4.4.2 Local and oracle solutions
4.4.3 Properties of local solutions
4.4.4 Global and approximate global solutions
4.5 Smaller and Sorted Penalties
4.5.1 Sorted concave penalties and their local approximation
4.5.2 Approximate PLS with smaller and sorted penalties
4.5.3 Properties of LLA and LCA
4.6 Bibliographical Notes
4.7 Exercises
5 Generalized Linear Models and Penalized Likelihood
5.1 Generalized Linear Models
5.1.1 Exponential family
5.1.2 Elements of generalized linear models
5.1.3 Maximum likelihood
5.1.4 Computing MLE: Iteratively reweighed least squares
5.1.5 Deviance and analysis of deviance
5.1.6 Residuals
5.2 5 Examples
5.2.1 Bernoulli and binomial models
5.2.2 Models for count responses
5.2.3 Models for nonnegative continuous responses
5.2.4 Normal error models
5.3 Sparest Solution in High Confidence Set
5.3.1 A general setup
5.3.2 Examples
5.3.3 Properties
5.4 Variable Selection via Penalized Likelihood
5.5 Algorithms
5.5.1 Local quadratic approximation
5.5.2 Local linear approximation
5.5.3 Coordinate descent
5.5.4 Iterative local adaptive majorization and minimization
5.6 Tuning Parameter Selection
5.7 An Application
5.8 Sampling Properties in Low-dimension
5.8.1 Notation and regularity conditions
5.8.2 The oracle property
5.8.3 Sampling properties with diverging dimensions
5.8.4 Asymptotic properties of GIC selectors
5.9 Properties under Ultrahigh Dimensions
5.9.1 The Lasso penalized estimator and its risk property
5.9.2 Strong oracle property
5.9.3 Numeric studies
5.10 Risk Properties
5.11 Bibliographical Notes
5.12 Exercises
6 Penalized M-estimators
6.1 Penalized Quantile Regression
6.1.1 Quantile regression
6.1.2 Variable selection in quantile regression
6.1.3 A fast algorithm for penalized quantile regression
6.2 Penalized Composite Quantile Regression
6.3 Variable Selection in Robust Regression
6.3.1 Robust regression
6.3.2 Variable selection in Huber regression
6.4 Rank Regression and Its Variable Selection
6.4.1 Rank regression
6.4.2 Penalized weighted rank regression
6.5 Variable Selection for Survival Data
6.5.1 Partial likelihood
6.5.2 Variable selection via penalized partial likelihood and its properties
6.6 Theory of Folded-concave Penalized M-estimator
6.6.1 Conditions on penalty and restricted strong convexity
6.6.2 Statistical accuracy of penalized M-estimator with folded concave penalties
6.6.3 Computational accuracy
6.7 Bibliographical Notes
6.8 Exercises
7 High Dimensional Inference
7.1 Inference in Linear Regression
7.1.1 Debias of regularized regression estimators
7.1.2 Choices of weights
7.1.3 Inference for the noise level
7.2 Inference in Generalized Linear Models
7.2.1 Desparsified Lasso
7.2.2 Decorrelated score estimator
7.2.3 Test of linear hypotheses
7.2.4 Numerical comparison
7.2.5 An application
7.3 Asymptotic Efficiency*
7.3.1 Statistical efficiency and Fisher information
7.3.2 Linear regression with random design
7.3.3 Partial linear regression
7.4 Gaussian Graphical Models
7.4.1 Inference via penalized least squares
7.4.2 Sample size in regression and graphical models
7.5 General Solutions*
7.5.1 Local semi-LD decomposition
7.5.2 Data swap
7.5.3 Gradient approximation
7.6 Bibliographical Notes
7.7 Exercises
8 Feature Screening
8.1 Correlation Screening
8.1.1 Sure screening property
8.1.2 Connection to multiple comparison
8.1.3 Iterative SIS
8.2 Generalized and Rank Correlation Screening
8.3 Feature Screening for Parametric Models
8.3.1 Generalized linear models
8.3.2 A unified strategy for parametric feature screening
8.3.3 Conditional sure independence screening
8.4 Nonparametric Screening
8.4.1 Additive models
8.4.2 Varying coefficient models
8.4.3 Heterogeneous nonparametric models
8.5 Model-free Feature Screening
8.5.1 Sure independent ranking screening procedure
8.5.2 Feature screening via distance correlation
8.5.3 Feature screening for high-dimensional categorial data
8.6 Screening and Selection
8.6.1 Feature screening via forward regression
8.6.2 Sparse maximum likelihood estimate
8.6.3 Feature screening via partial correlation
8.7 Refitted Cross-Validation
8.7.1 RCV algorithm
8.7.2 RCV in linear models
8.7.3 RCV in nonparametric regression
8.8 An Illustration
8.9 Bibliographical Notes
8.10 Exercises
9 Covariance Regularization and Graphical Models
9.1 Basic Facts about Matrices
9.2 Sparse Covariance Matrix Estimation
9.2.1 Covariance regularization by thresholding and banding
9.2.2 Asymptotic properties
9.2.3 Nearest positive definite matrices
9.3 Robust Covariance Inputs
9.4 Sparse Precision Matrix and Graphical Models
9.4.1 Gaussian graphical models
9.4.2 Penalized likelihood and M-estimation
9.4.3 Penalized least-squares
9.4.4 CLIME and its adaptive version
9.5 Latent Gaussian Graphical Models
9.6 Technical Proofs
9.6.1 Proof of Theorem 9.1
9.6.2 Proof of Theorem 9.3
9.6.3 Proof of Theorem 9.4
9.6.4 Proof of Theorem 9.6
9.7 Bibliographical Notes
9.8 Exercises
10 Covariance Learning and Factor Models
10.1 Principal Component Analysis
10.1.1 Introduction to PCA
10.1.2 Power method
10.2 Factor Models and Structured Covariance Learning
10.2.1 Factor model and high-dimensional PCA
10.2.2 Extracting latent factors and POET
10.2.3 Methods for selecting number of factors
10.3 Covariance and Precision Learning with Known Factors
10.3.1 Factor model with observable factors
10.3.2 Robust initial estimation of covariance matrix
10.4 Augmented Factor Models and Projected PCA
10.5 Asymptotic Properties
10.5.1 Properties for estimating loading matrix
10.5.2 Properties for estimating covariance matrices
10.5.3 Properties for estimating realized latent factors
10.5.4 Properties for estimating idiosyncratic components
10.6 Technical Proofs
10.6.1 Proof of Theorem 10.1
10.6.2 Proof of Theorem 10.2
10.6.3 Proof of Theorem 10.3
10.6.4 Proof of Theorem 10.4
10.7 Bibliographical Notes
10.8 Exercises
11 Applications of Factor Models and PCA
11.1 Factor-adjusted Regularized Model Selection
11.1.1 Importance of factor adjustments
11.1.2 FarmSelect
11.1.3 Application to forecasting bond risk premia
11.1.4 Application to a neuroblastoma data
11.1.5 Asymptotic theory for FarmSelect
11.2 Factor-adjusted Robust Multiple Testing
11.2.1 False discovery rate control
11.2.2 Multiple testing under dependence measurements
11.2.3 Power of factor adjustments
11.2.4 FarmTest
11.2.5 Application to neuroblastoma data
11.3 Factor Augmented Regression Methods
11.3.1 Principal component regression
11.3.2 Augmented principal component regression
11.3.3 Application to forecast bond risk premia
11.4 Applications to Statistical Machine Learning
11.4.1 Community detection
11.4.2 Topic model
11.4.3 Matrix completion
11.4.4 Item ranking
11.4.5 Gaussian mixture models
11.5 Bibliographical Notes
11.6 1 Exercises
12 Supervised Learning
12.1 Model-based Classifiers
12.1.1 Linear and quadratic discriminant analysis
12.1.2 Logistic regression
12.2 Kernel Density Classifiers and Naive Bayes
12.3 Nearest Neighbor Classifiers
12.4 Classification Trees and Ensemble Classifiers
12.4.1 Classification trees
12.4.2 Bagging
12.4.3 Random forests
12.4.4 Boosting
12.5 Support Vector Machines
12.5.1 The standard support vector machine
12.5.2 Generalizations of SVMs
12.6 Sparse Classifiers via Penalized Empirical Loss
12.6.1 The importance of sparsity under high-dimensionality
12.6.2 Sparse support vector machines
12.6.3 Sparse large margin classifiers
12.7 Sparse Discriminant Analysis
12.7.1 Nearest shrunken centroids classifier
12.7.2 Features annealed independent rule
12.7.3 Selection bias of sparse independence rules
12.7.4 Regularized optimal affine discriminant
12.7.5 Linear programming discriminant
12.7.6 Direct sparse discriminant analysis
12.7.7 Solution path equivalence between ROAD and DSDA
12.8 Feature Augmention and Sparse Additive Classifiers
12.8.1 Feature augmentation
12.8.2 Penalized additive logistic regression
12.8.3 Semiparametric sparse discriminant analysis
12.9 Bibliographical Notes
12.10 12.10 Exercises
13 Unsupervised Learning
13.1 Cluster Analysis
13.1.1 K-means clustering
13.1.2 Hierarchical clustering
13.1.3 Model-based clustering
13.1.4 Spectral clustering
13.2 Data-driven Choices of the Number of Clusters
13.3 Variable Selection in Clustering
13.3.1 Sparse clustering
13.3.2 Sparse model-based clustering
13.3.3 Sparse mixture of experts model
13.4 An Introduction to High Dimensional PCA
13.4.1 Inconsistency of the regular PCA
13.4.2 Consistency under sparse eigenvector model
13.5 Sparse Principal Component Analysis
13.5.1 Sparse PCA
13.5.2 An iterative SVD thresholding approach
13.5.3 A penalized matrix decomposition approach
13.5.4 A semidefinite programming approach
13.5.5 A generalized power method
13.6 Bibliographical Notes
13.7 Exercises
14 An Introduction to Deep Learning
14.1 Rise of Deep Learning
14.2 Feed-forward Neural Networks
14.2.1 Model setup
14.2.2 Back-propagation in computational graphs
14.3 Popular Models
14.3.1 Convolutional neural networks
14.3.2 Recurrent neural networks
14.3.2.1 Vanilla RNNs
14.3.2.2 GRUs and LSTM
14.3.2.3 Multilayer RNNs
14.3.3 Modules
14.4 Deep Unsupervised Learning
14.4.1 Autoencoders
14.4.2 Generative adversarial networks
14.4.2.1 Sampling view of GANs
14.4.2.2 Minimum distance view of GANs
14.5 Training deep neural nets
14.5.1 Stochastic gradient descent
14.5.1.1 Mini-batch SGD
14.5.1.2 Momentum-based SGD
14.5.1.3 SGD with adaptive learning rates
14.5.2 Easing numerical instability
14.5.2.1 ReLU activation function
14.5.2.2 Skip connections
14.5.2.3 Batch normalization
14.5.3 Regularization techniques
14.5.3.1 Weight decay
14.5.3.2 Dropout
14.5.3.3 Data augmentation
14.6 Example: Image Classification
14.7 Additional Examples using FensorFlow and R
14.8 Bibliography Notes
References
Author Index
Index
← Prev
Back
Next →
← Prev
Back
Next →