Hands-On Machine Learning with R; Edition 1

Bibliography

Agresti, A. (2003). Categorical Data Analysis. Wiley Series in Probability and Statistics. Wiley.

Allaire, J. (2018). tfruns: Training Run Tools for ’TensorFlow’. R package version 1.4.

Allaire, J. and Chollet, F. (2019). keras: R Interface to ’Keras’. R package version 2.2.4.1.9001.

Banfield, J. D. and Raftery, A. E. (1993). Model-based gaussian and non-gaussian clustering. Biometrics, pages 803–821.

Bengio, Y., Yao, L., Alain, G., and Vincent, P. (2013). Generalized denoising auto-encoders as generative models. In Advances in Neural Information Processing Systems, pages 899–907.

Berge, L., Bouveyron, C., and Girard, S. (2018). HDclassif: High Dimensional Supervised Classification and Clustering. R package version 2.1.0.

Bergstra, J. and Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(Feb):281–305.

Beygelzimer, A., Kakade, S., and Langford, J. (2006). Cover trees for nearest neighbor. In Proceedings of the 23rd International Conference on Machine Learning, pages 97–104. ACM.

Beygelzimer, A., Kakadet, S., Langford, J., Arya, S., Mount, D., and Li, S. (2019). FNN: Fast Nearest Neighbor Search Algorithms and Applications. R package version 1.1.3.

Biecek, P. (2019). DALEX: Descriptive mAchine Learning EXplanations. R package version 0.4.

Bourlard, H. and Kamp, Y. (1988). Auto-association by multilayer perceptrons and singular value decomposition. Biological Cybernetics, 59(4–5):291–294.

Bouveyron, C. and Brunet-Saumard, C. (2014). Model-based clustering of high-dimensional data: A review. Computational Statistics & Data Analysis, 71:52–78.

Bouveyron, C., Girard, S., and Schmid, C. (2007). High-dimensional data clustering. Computational Statistics & Data Analysis, 52(1):502–519.

Box, G. E. and Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B (Methodological), pages 211–252.

Breiman, L. (1984). Classification and Regression Trees. Routledge.

Breiman, L. (1996a). Bagging predictors. Machine Learning, 24(2):123–140.

Breiman, L. (1996b). Stacked regressions. Machine Learning, 24(1):49–64.

Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32.

Breiman, L. et al. (2001). Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science, 16(3):199–231.

Breiman, L. and Ihaka, R. (1984). Nonlinear discriminant analysis via scaling and ACE. Department of Statistics, University of California.

Bruce, P. and Bruce, A. (2017). Practical Statistics for Data Scientists: 50 Essential Concepts. O’Reilly Media, Inc.

Carroll, R. J. and Ruppert, D. (1981). On prediction and the power transformation family. Biometrika, 68(3):609–615.

Celeux, G. and Govaert, G. (1995). Gaussian parsimonious clustering models. Pattern Recognition, 28(5):781–793.

Charrad, M., Ghazzali, N., Boiteau, V., and Niknafs, A. (2015). NbClust: Determining the Best Number of Clusters in a Data Set. R package version 3.0.

Chawla, N. V., Bowyer, K. W., Hall, L. O., and Kegelmeyer, W. P. (2002). Smote: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357.

Chen, T. and Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, pages 785–794, New York, NY, USA. ACM.

Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., Chen, K., Mitchell, R., Cano, I., Zhou, T., Li, M., Xie, J., Lin, M., Geng, Y., and Li, Y. (2018). xgboost: Extreme Gradient Boosting. R package version 0.71.2.

Chollet, F. and Allaire, J. J. (2018). Deep Learning with R. Manning Publications Company.

Cireşan, D., Meier, U., and Schmidhuber, J. (2012). Multi-column deep neural networks for image classification. arXiv preprint arXiv:1202.2745.

Cunningham, P. and Delany, S. J. (2007). k-nearest neighbour classifiers. Multiple Classifier Systems, 34(8):1–17.

Dasgupta, A. and Raftery, A. E. (1998). Detecting features in spatial point processes with clutter via model-based clustering. Journal of the American Statistical Association, 93(441):294–302.

Davison, A. C., Hinkley, D. V., et al. (1997). Bootstrap Methods and their Application, volume 1. Cambridge University Press.

De Cock, D. (2011). Ames, Iowa: Alternative to the Boston housing data as an end of semester regression project. Journal of Statistics Education, 19(3).

De Maesschalck, R., Jouan-Rimbaud, D., and Massart, D. L. (2000). The mahalanobis distance. Chemometrics and Intelligent Laboratory Systems, 50(1):1–18.

Deane-Mayer, Z. A. and Knowles, J. E. (2016). caretEnsemble: Ensembles of Caret Models. R package version 2.0.0.

Díaz-Uriarte, R. and De Andres, S. A. (2006). Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 7(1):3.

Dietterich, T. G. (2000a). Ensemble methods in machine learning. In International Workshop on Multiple Classifier Systems, pages 1–15. Springer.

Dietterich, T. G. (2000b). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2):139–157.

Doersch, C. (2016). Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908.

Dorogush, A. V., Ershov, V., and Gulin, A. (2018). Catboost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363.

Doshi-Velez, F. and Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.

Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American Statistical Association, 78(382):316–331.

Efron, B. and Hastie, T. (2016). Computer Age Statistical Inference, volume 5. Cambridge University Press.

Efron, B. and Tibshirani, R. (1986). Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Science, pages 54–75.

Efron, B. and Tibshirani, R. (1997). Improvements on cross-validation: the 632+ bootstrap method. Journal of the American Statistical Association, 92(438):548–560.

Erichson, N. B., Zheng, P., and Aravkin, S. (2018). sparsepca: Sparse Principal Component Analysis (SPCA). R package version 0.1.2.

Faraway, J. J. (2016a). Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models, volume 124. CRC press.

Faraway, J. J. (2016b). Linear Models with R. Chapman and Hall/CRC.

Fisher, A., Rudin, C., and Dominici, F. (2018). Model class reliance: Variable importance measures for any machine learning model class, from the” rashomon” perspective. arXiv preprint arXiv:1801.01489.

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2):179–188.

Fisher, W. D. (1958). On grouping for maximum homogeneity. Journal of the American Statistical Association, 53(284):789–798.

Fraley, C. and Raftery, A. E. (1998). How many clusters? which clustering method? answers via model-based cluster analysis. The Computer Journal, 41(8):578–588.

Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458):611–631.

Fraley, C., Raftery, A. E., Murphy, T. B., and Scrucca, L. (2012). mclust Version 4 for R: Normal Mixture Modeling for Model-based Clustering, Classification, and Density Estimation. Technical report, University of Washington.

Fraley, C., Raftery, A. E., and Scrucca, L. (2019). mclust: Gaussian Mixture Modelling for Model-Based Clustering, Classification, and Density Estimation. R package version 5.4.3.

Freund, Y. and Schapire, R. E. (1999). Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29(1–2):79–103.

Friedman, J., Hastie, T., and Tibshirani, R. (2001). The Elements of Statistical Learning, volume 1. Springer Series in Statistics New York, NY, USA:.

Friedman, J., Hastie, T., Tibshirani, R., Simon, N., Narasimhan, B., and Qian, J. (2018). glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models. R package version 2.0-16.

Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, pages 1–67.

Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of Statistics, pages 1189–1232.

Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4):367–378.

Friedman, J. H., Popescu, B. E., et al. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3):916–954.

from mda:mars by Trevor Hastie, S. M. D. and utilities with Thomas Lumley’s leaps wrapper., R. T. U. A. M. F. (2019). earth: Multivariate Adaptive Regression Splines. R package version 5.1.1.

Geladi, P. and Kowalski, B. R. (1986). Partial least-squares regression: a tutorial. Analytica chimica acta, 185:1–17.

Géron, A. (2017). Hands-on Machine Learning with Scikit-Learn and Tensor-Flow: Concepts, Tools, and Techniques to Build Intelligent Systems. O’Reilly Media, Inc.

Geurts, P., Ernst, D., and Wehenkel, L. (2006). Extremely randomized trees. Machine Learning, 63(1):3–42.

Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E. (2015). Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. Journal of Computational and Graphical Statistics, 24(1):44–65.

Goldstein, B. A., Polley, E. C., and Briggs, F. B. (2011). Random forests for genetic association studies. Statistical Applications in Genetics and Molecular Biology, 10(1).

Golub, G. H., Heath, M., and Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21(2):215–223.

Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning, volume 1. MIT Press Cambridge.

Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, pages 857–871.

Granitto, P. M., Furlanello, C., Biasioli, F., and Gasperi, F. (2006). Recursive feature elimination with random forest for ptr-ms analysis of agroindustrial products. Chemometrics and Intelligent Laboratory Systems, 83(2):83–90.

Greenwell, B. (2018). pdp: Partial Dependence Plots. R package version 0.7.0.

Greenwell, B., Boehmke, B., Cunningham, J., and Developers, G. (2018a). gbm: Generalized boosted regression models. R Package Version 2.1, 4.

Greenwell, B. M., Boehmke, B. C., and McCarthy, A. J. (2018b). A simple and effective model-based variable importance measure. arXiv preprint arXiv:1805.04755.

Greenwell, B. M., McCarthy, A. J., Boehmke, B. C., and Lui, D. (2018c). Residuals and diagnostics for binary and ordinal regression models: An introduction to the sure package. The R Journal, 10(1):1–14.

Greenwell, Brandon M. and Boehmke, Bradley C. (2019). Quantifying the strength of potential interaction effects. https://koalaverse.github.io/vip/articles/vip-interaction.html.

Guo, C. and Berkhahn, F. (2016). Entity embeddings of categorical variables. arXiv preprint arXiv:1604.06737.

Hair, J. F. (2006). Multivariate Data Analysis. Pearson Education India.

Hall, Patrick (2018). Awesome machine learning interpretability: A curated, but probably biased and incomplete, list of awesome machine learning interpretability resources. https://github.com/jphall663/awesome-machine-learning-interpretability.

Han, J., Pei, J., and Kamber, M. (2011). Data Mining: Concepts and Techniques. Elsevier.

Harrell, F. E. (2015). Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. Springer Series in Statistics. Springer International Publishing.

Harrison Jr, D. and Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1):81–102.

Hartigan, J. A. and Wong, M. A. (1979). Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100–108.

Hastie, T. (2016). svmpath: The SVM Path Algorithm. R package version 0.955.

Hastie, T., Tibshirani, R., and Wainwright, M. (2015). Statistical Learning with Sparsity: The Lasso and Generalizations. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. Taylor & Francis.

Hawkins, D. M., Basak, S. C., and Mills, D. (2003). Assessing model fit by cross-validation. Journal of Chemical Information and Computer Sciences, 43(2):579–586.

Hinton, G. E. and Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786):504–507.

Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580.

Hinton, G. E. and Zemel, R. S. (1994). Autoencoders, minimum description length and helmholtz free energy. In Advances in Neural Information Processing Systems, pages 3–10.

Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67.

Hothorn, T., Hornik, K., and Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3):651–674.

Hothorn, T. and Zeileis, A. (2015). partykit: A modular toolkit for recursive partytioning in r. The Journal of Machine Learning Research, 16(1):3905–3909.

Hunt, T. (2018). ModelMetrics: Rapid Calculation of Model Metrics. R package version 1.2.2.

Hyndman, R. J. and Athanasopoulos, G. (2018). Forecasting: Principles and Practice. OTexts.

Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.

Irizarry, R. A. (2018). dslabs: Data Science Labs. R package version 0.5.2.

Janitza, S., Binder, H., and Boulesteix, A.-L. (2016). Pitfalls of hypothesis tests and model selection on bootstrap samples: causes and consequences in biometrical applications. Biometrical Journal, 58(3):447–473.

Jiang, S., Pang, G., Wu, M., and Kuang, L. (2012). An improved k-nearest-neighbor algorithm for text categorization. Expert Systems with Applications, 39(1):1503–1509.

Karatzoglou, A., Smola, A., and Hornik, K. (2018). kernlab: Kernel-Based Machine Learning Lab. R package version 0.9-27.

Karatzoglou, A., Smola, A., Hornik, K., and Zeileis, A. (2004). kernlab – an S4 package for kernel methods in R. Journal of Statistical Software, 11(9):1–20.

Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied Statistics, pages 119–127.

Kaufman, L. and Rousseeuw, P. J. (2009). Finding Groups in Data: an Introduction to Cluster Analysis, volume 344. John Wiley & Sons.

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., and Liu, T.-Y. (2017). Lightgbm: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems, pages 3146–3154.

Ketchen, D. J. and Shook, C. L. (1996). The application of cluster analysis in strategic management research: an analysis and critique. Strategic Management Journal, 17(6):441–458.

Kim, J.-H. (2009). Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics & Data Analysis, 53(11):3735–3745.

Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.

Kuhn, M. (2014). Futility analysis in the cross-validation of machine learning models. arXiv preprint arXiv:1405.6974.

Kuhn, M. (2017a). AmesHousing: The Ames Iowa Housing Data. R package version 0.0.3.

Kuhn, M. (2017b). The R formula method: the bad parts. R Views.

Kuhn, M. (2018). Applied machine learning workshop. https://github.com/tidymodels/aml-training.

Kuhn, M. (2019). Applied machine learning. RStudio Conference.

Kuhn, M. and Johnson, K. (2013). Applied Predictive Modeling, volume 26. Springer.

Kuhn, M. and Johnson, K. (2018). AppliedPredictiveModeling: Functions and Data Sets for ’Applied Predictive Modeling’. R package version 1.1-7.

Kuhn, M. and Johnson, K. (2019). Feature Engineering and Selection: A Practical Approach for Predictive Models. Chapman & Hall/CRC.

Kuhn, M. and Wickham, H. (2019). rsample: General Resampling Infrastructure. R package version 0.0.4.

Kursa, M. B., Rudnicki, W. R., et al. (2010). Feature selection with the boruta package. J Stat Softw, 36(11):1–13.

Kutner, M. H., Nachtsheim, C. J., Neter, J., and Li, W. (2005). Applied Linear Statistical Models. McGraw Hill, 5th edition.

LeCun, Y. (1987). Modeles connexionnistes de l’apprentissage (connectionist learning models). Technical report, Ph.D. thesis, Universite P. et M. Curie (Paris 6).

LeCun, Y., Boser, B. E., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W. E., and Jackel, L. D. (1990). Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems, pages 396–404.

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324.

LeDell, E., Sapp, S., van der Laan, M., and LeDell, M. E. (2014). Package ‘subsemble’.

Lee, H., Ekanadham, C., and Ng, A. Y. (2008). Sparse deep belief net model for visual area v2. In Advances in Neural Information Processing Systems, pages 873–880.

Lee, S. X. and McLachlan, G. J. (2013). Model-based clustering and classification with non-normal mixture distributions. Statistical Methods & Applications, 22(4):427–454.

Liaw, A. and Wiener, M. (2002). Classification and regression by randomforest. R News, 2(3):18–22.

Little, R. J. and Rubin, D. B. (2014). Statistical Analysis with Missing Data, volume 333. John Wiley & Sons.

Liu, D. and Zhang, H. (2018). Residuals and diagnostics for ordinal regression models: A surrogate approach. Journal of the American Statistical Association, 113(522):845–854.

Loh, W.-Y. and Vanichsetakul, N. (1988). Tree-structured classification via generalized discriminant analysis. Journal of the American Statistical Association, 83(403):715–725.

Lundberg, S. and Lee, S.-I. (2016). An unexpected unity among methods for interpreting model predictions. arXiv preprint arXiv:1611.07478.

Lundberg, S. M. and Lee, S.-I. (2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, pages 4765–4774.

Luu, K., Blum, M., and Privé, F. (2019). pcadapt: Fast Principal Component Analysis for Outlier Detection. R package version 4.1.0.

Ma, Y., Derksen, H., Hong, W., and Wright, J. (2007). Segmentation of multivariate mixed data via lossy data coding and compression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9):1546–1562.

Makhzani, A. and Frey, B. (2014). A winner-take-all method for training sparse convolutional autoencoders. In NIPS Deep Learning Workshop. Citeseer.

Makhzani, A. and Frey, B. J. (2015). Winner-take-all autoencoders. In Advances in Neural Information Processing Systems, pages 2791–2799.

Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., and Frey, B. (2015). Adversarial autoencoders. arXiv preprint arXiv:1511.05644.

Maldonado, S. and Weber, R. (2009). A wrapper method for feature selection using support vector machines. Information Sciences, 179(13):2208–2217.

Masci, J., Meier, U., Cireşan, D., and Schmidhuber, J. (2011). Stacked convolutional auto-encoders for hierarchical feature extraction. In International Conference on Artificial Neural Networks, pages 52–59. Springer.

Massy, W. F. (1965). Principal components regression in exploratory statistical research. Journal of the American Statistical Association, 60(309):234–256.

Mccord, M. and Chuah, M. (2011). Spam detection on twitter using traditional classifiers. In International Conference on Autonomic and Trusted Computing, pages 175–186. Springer.

Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., and Leisch, F. (2019). e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. R package version 1.7-1.

Micci-Barreca, D. (2001). A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems. ACM SIGKDD Explorations Newsletter, 3(1):27–32.

Molinaro, A. M., Simon, R., and Pfeiffer, R. M. (2005). Prediction error estimation: a comparison of resampling methods. Bioinformatics, 21(15):3301–3307.

Molnar, C. (2019). iml: Interpretable Machine Learning. R package version 0.9.0.

Molnar, C. et al. (2018). Interpretable machine learning: A guide for making black box models explainable. E-book at<https://christophm.github.io/interpretable-ml-book/>, version dated, 10.

Pearson, K. (1901). Liii. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11):559–572.

Pedersen, T. L. and Benesty, M. (2018). lime: Local Interpretable Model-Agnostic Explanations. R package version 0.4.1.

Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in Large Margin Classifiers, pages 61–74. MIT Press.

Polley, E., LeDell, E., Kennedy, C., and van der Laan, M. (2019). SuperLearner: Super Learner Prediction. R package version 2.0-25.

Poultney, C., Chopra, S., Cun, Y. L., et al. (2007). Efficient learning of sparse representations with an energy-based model. In Advances in Neural Information Processing Systems, pages 1137–1144.

Probst, P., Bischl, B., and Boulesteix, A.-L. (2018). Tunability: Importance of hyperparameters of machine learning algorithms. arXiv preprint arXiv:1802.09596.

Probst, P., Wright, M. N., and Boulesteix, A.-L. (2019). Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, page e1301.

Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1):81–106.

Quinlan, J. R. et al. (1996). Bagging, boosting, and c4. 5. In AAAI/IAAI, Vol. 1, pages 725–730.

Rashmi, K. V. and Gilad-Bachrach, R. (2015). Dart: Dropouts meet multiple additive regression trees. In AISTATS, pages 489–497.

Ribeiro, M. T., Singh, S., and Guestrin, C. (2016). Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1135–1144. ACM.

Rifai, S., Vincent, P., Muller, X., Glorot, X., and Bengio, Y. (2011). Contractive auto-encoders: Explicit invariance during feature extraction. In Proceedings of the 28th International Conference on International Conference on Machine Learning, pages 833–840. Omnipress.

Ripley, B. D. (2007). Pattern Recognition and Neural Networks. Cambridge University Press.

Robinson, J. T. (1981). The kdb-tree: a search structure for large multidimensional dynamic indexes. In Proceedings of the 1981 ACM SIGMOD International Conference on Management of Data, pages 10–18. ACM.

Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20:53–65.

Ruder, S. (2016). An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747.

Saeys, Y., Inza, I., and Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19):2507–2517.

Sakurada, M. and Yairi, T. (2014). Anomaly detection using autoencoders with nonlinear dimensionality reduction. In Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, page 4. ACM.

Sapp, S., van der Laan, M. J., and Canny, J. (2014). Subsemble: an ensemble method for combining subset-specific algorithm fits. Journal of Applied Statistics, 41(6):1247–1259.

Sarle, Warren S. (n.d.). comp.ai.neural-nets faq. [Online; accessed 116-April-2019].

Segal, M. R. (2004). Machine learning benchmarks and random forest regression. UCSF: Center for Bioinformatics and Molecular Biostatistics.

Shah, A. D., Bartlett, J. W., Carpenter, J., Nicholas, O., and Hemingway, H. (2014). Comparison of random forest and parametric imputation models for imputing missing data using mice: a caliber study. American Journal of Epidemiology, 179(6):764–774.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014a). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15:1929–1958.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. (2014b). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1):1929–1958.

Staniak, M. and Biecek, P. (2018). Explanations of model predictions with live and breakdown packages. arXiv preprint arXiv:1804.01955.

Stekhoven, D. J. (2015). missforest: Nonparametric missing value imputation using random forest. Astrophysics Source Code Library.

Stone, C. J., Hansen, M. H., Kooperberg, C., Truong, Y. K., et al. (1997). Polynomial splines and their tensor products in extended linear modeling: 1994 wald memorial lecture. The Annals of Statistics, 25(4):1371–1470.

Strobl, C., Boulesteix, A.-L., Zeileis, A., and Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics, 8(1):25.

Štrumbelj, E. and Kononenko, I. (2014). Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 41(3):647–665.

Surowiecki, J. (2005). The wisdom of crowds. Anchor.

Therneau, T. M., Atkinson, E. J., et al. (1997). An introduction to recursive partitioning using the RPART routines. Technical report, Mayo Foundation. http://www.mayo.edu/hsr/techrpt/61.pdf.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288.

Tibshirani, R., Walther, G., and Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2):411–423.

Tierney, N. (2019). visdat: Preliminary Visualisation of Data. R package version 0.5.3.

Udell, M., Horn, C., Zadeh, R., Boyd, S., et al. (2016). Generalized low rank models. Foundations and Trends® in Machine Learning, 9(1):1–118.

van der Laan, M. J., Polley, E. C., and Hubbard, A. E. (2003). Super learner. Statistical Applications in Genetics and Molecular Biology, 6(1).

Van der Laan, M. J., Polley, E. C., and Hubbard, A. E. (2007). Super learner. Statistical Applications in Genetics and Molecular Biology, 6(1).

Vincent, P. (2011). A connection between score matching and denoising autoencoders. Neural Computation, 23(7):1661–1674.

Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th Iternational Conference on Machine Learning, pages 1096–1103. ACM.

West, B. T., Welch, K. B., and Galecki, A. T. (2014). Linear Mixed Models: A Practical Guide Using Statistical Software. Chapman and Hall/CRC.

Wickham, H. (2014). Advanced R. Chapman and Hall/CRC.

Wickham, H. et al. (2014). Tidy data. Journal of Statistical Software, 59(10):1–23.

Wickham, H. and Grolemund, G. (2016). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media, Inc.

Wikipedia contributors (n.d.a). Autoencoder. [Online; accessed 25-May-2019].

Wikipedia contributors (n.d.b). Mnist database. [Online; accessed 15-April-2019].

Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5:241–259.

Wolpert, D. H. (1996). The lack of a priori distinctions between learning algorithms. Neural Computation, 8(7):1341–1390.

Wright, M. and Ziegler, A. (2017). ranger: A fast implementation of random forests for high dimensional data in c++ and r. Journal of Statistical Software, Articles, 77(1):1–17.

Zhang, W., Zhao, D., and Wang, X. (2013). Agglomerative clustering via maximum incremental path integral. Pattern Recognition, 46(11):3056–3065.

Zhao, D. and Tang, X. (2009). Cyclizing clusters via zeta function of a graph. In Advances in Neural Information Processing Systems, pages 1953–1960.

Zheng, A. and Casari, A. (2018). Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists. O’Reilly Media, Inc.

Zhou, C. and Paffenroth, R. C. (2017). Anomaly detection with robust deep autoencoders. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 665–674. ACM.

Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2):301–320.

Zumel, N. and Mount, J. (2016). vtreat: a data. frame processor for predictive modeling. arXiv preprint arXiv:1611.09477.