Aitken, A. C. (1926). A series formula for the roots of algebraic and transcendental equations. Proceedings of the Royal Society of Edinburgh 45, 14–22.
Anderson, E. (1935). The irises of the Gaspé Peninsula. Bulletin of the American Iris Society 59, 2–5.
Andrews, J. L. and P. D. McNicholas (2013). vscc: Variable Selection for Clustering and Classification. R package version 0.2.
Andrews, J. L. and P. D. McNicholas (2014). Variable selection for clustering and classification. Journal of Classification 31 (2), 136–153.
Anton, H. and C. Rorres (1994). Elementary Linear Algebra (7th ed.). New York: John Wiley & Sons.
Bezansony, J., A. Edelmanz, S. Karpinskix, and V. B. Shahy (2017). Julia: A fresh approach to numerical computing. SIAM Review 59 (1), 65–98.
Bishop, C. M. (2006). Pattern Recognition and Machine Learning. New York: Springer.
Böhning, D., E. Dietz, R. Schaub, P. Schlattmann, and B. Lindsay (1994). The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Annals of the Institute of Statistical Mathematics 46, 373–388.
Box, D. and A. Hejlsberg (2007). LINQ:.NET language-integrated query. msdn.microsoft.com/en-us/library/bb308959.aspx
.
Breiman, L. (1996). Bagging predictors. Machine Learning 24 (2), 123–140.
Breiman, L. (2001a). Random forests. Machine Learning 45 (1), 5–32.
Breiman, L. (2001b). Statistical modeling: The two cultures. Statistical Science 16 (3), 199–231.
Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone (1984). Classification and Regression Trees. Boca Raton: Chapman & Hall/CRC Press.
Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review 78 (1), 1–3.
Briggs, D. E., C. A. Boulton, P. A. Brookes, and R. Stevens (2004). Brewing: Science and Practice. Boca Raton: CRC Press.
Browne, R. P. and P. D. McNicholas (2014). mixture: Mixture Models for Clustering and Classification. R package version 1.1.
Carr, D. B., R. J. Littlefield, W. Nicholson, and J. Littlefield (1987). Scatterplot matrix techniques for large n. Journal of the American Statistical Association 82 (398), 424–436.
Chen, T. and C. Guestrin (2016). XGBoost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, pp. 785–794. ACM.
Cleveland, W. S. (2001). Data science: An action plan for expanding the technical areas of the field of statistics. International Statistical Review/Revue Internationale De Statistique 69 (1), 21–26.
Davenport, T. H. and D. J. Patil (2012). Data scientist: The sexiest job of the 21st century. Harvard Business Review. Sourced from hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
Davison, A. C. and D. V. Hinkley (1997). Bootstrap Methods and their Application. New York: Cambridge University Press.
Dempster, A. P., N. M. Laird, and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B 39 (1), 1–38.
Efron, B. (1979). Bootstrap methods: Another look at the jack-knife. The Annals of Statistics 7 (1), 1–26.
Efron, B. (2002). The bootstrap in modern statistics, pp. 326–332. Statistics in the 21st Century. Boca Raton: Chapman & Hall/CRC Press.
Efron, B. and T. Hastie (2016). Computer Age Statistical Inference. Cambridge: Cambridge University Press.
Efron, B. and R. J. Tibshirani (1993). An Introduction to the Bootstrap. Boca Raton: Chapman & Hall/CRC Press.
Eldén, L. (2007). Matrix Methods in Data Mining and Pattern Recognition. Philadelphia: SIAM.
Everitt, B. S., S. Landau, M. Leese, and D. Stahl (2011). Cluster Analysis (5th ed.). Chichester: John Wiley & Sons.
Fernández-Delgado, M., E. Cernadas, S. Barro, and D. Amorim (2014). Do we need hundreds of classifiers to solve real world classification problems? Journal of Machine Learning Research 15 (1), 3133–3181.
Friedl, J. E. F. (2006). Mastering Regular Expressions: Understand Your Data and Be More Productive (3rd ed.). Sebastopol, California: O’Reilly Media, Inc.
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics 29(5), 1189–1232.
Fujikoshi, Y., V. V. Ulyanov, and R. Shimizu (2010). Multivariate Statistics: High-Dimensional and Large-Sample Approximations. Hoboken: John Wiley & Sons Inc.
Gelman, A., C. Pasarica, and R. Dodhia (2002). Let’s practice what we preach: turning tables into graphs. The American Statistician 56(2), 121–130.
Geurts, P., D. Ernst, and L. Wehenkel (2006). Extremely randomized trees. Machine Learning 63(1), 3–42.
Ghahramani, Z. and G. E. Hinton (1997). The EM algorithm for factor analyzers. Technical Report CRG-TR-96-1, University of Toronto, Canada.
Gneiting, T. and A. E. Raftery (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association 102(477), 359–378.
Goldberg, D. (1991). What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys 23(1), 5–48.
Graybill, F. A. (1983). Matrices with Applications in Statistics (2nd ed.). Belmont, California: Wadsworth.
Hardin, J., R. Hoerl, N. J. Horton, D. Nolan, B. Baumer, O. Hall-Holt, P. Murrell, R. Peng, P. Roback, D. T. Lang, and M. D. Ward (2015). Data science in statistics curricula: Preparing students to “think with data”. The American Statistician 69(4), 343–353.
Hastie, T., R. Tibshirani, and J. Friedman (2009). The Elements of Statistical Learning (2nd ed.). New York: Springer.
Hayashi, C. (1998). What is data science? Fundamental concepts and a heuristic example. In C. Hayashi, K. Yajima, H. H. Bock, N. Ohsumi, Y. Tanaka, and Y. Baba (Eds.), Data Science, Classification, and Related Methods. Studies in Classification, Data Analysis, and Knowledge Organization. Tokyo: Springer.
Higham, N. J. (2002). Accuracy and Stability of Numerical Algorithms (2nd ed.). Philadelphia: SIAM.
Hintze, J. L. and R. D. Nelson (1998). Violin plots: A box plotdensity trace synergism. The American Statistician 52(2), 181–184.
Hunter, D. R. and K. Lange (2000). Rejoinder to discussion of “Optimization transfer using surrogate objective functions”. Journal of Computational and Graphical Statistics 9, 52–59.
Hunter, D. R. and K. Lange (2004). A tutorial on MM algorithms. The American Statistician 58(1), 30–37.
Kuhn, M. (2017). caret: Classification and Regression Training. R package version 6.0-78.
Kuhn, M. and K. Johnson (2013). Applied Predictive Modeling. New York: Springer-Verlag.
Lawley, D. N. and A. E. Maxwell (1962). Factor analysis as a statistical method. Journal of the Royal Statistical Society: Series D 12(3), 209–229.
Lindsay, B. G. (1995). Mixture models: Theory, geometry and applications. In NSF-CBMS Regional Conference Series in Probability and Statistics, Volume 5. California: Institute of Mathematical Statistics: Hayward.
Lopes, H. F. and M. West (2004). Bayesian model assessment in factor analysis. Statistica Sinica 14, 41–67.
Lütkepohl, H. (1996). Handbook of Matrices. Chicester: John Wiley & Sons.
McCullagh, P. and J. A. Nelder (1989). Generalized Linear Models. Boca Raton: Chapman & Hall/CRC Press.
McLachlan, G. J. and T. Krishnan (2008). The EM Algorithm and Extensions (2nd ed.). New York: Wiley.
McLachlan, G. J. and D. Peel (2000). Mixtures of factor analyzers. In Proceedings of the Seventh International Conference on Machine Learning, pp. 599–606. San Francisco: Morgan Kaufmann.
McNicholas, P. D. (2010). Model-based classification using latent Gaussian mixture models. Journal of Statistical Planning and Inference 140 (5), 1175–1181.
McNicholas, P. D. (2016a). Mixture Model-Based Classification. Boca Raton: Chapman & Hall/CRC Press.
McNicholas, P. D. (2016b). Model-based clustering. Journal of Classification 33(3), 331–373.
McNicholas, P. D., A. ElSherbiny, A. F. McDaid, and T. B. Murphy (2018). pgmm: Parsimonious Gaussian Mixture Models. R package version 1.2.2.
McNicholas, P. D. and T. B. Murphy (2008). Parsimonious Gaussian mixture models. Statistics and Computing 18(3), 285–296.
McNicholas, P. D. and T. B. Murphy (2010). Model-based clustering of microarray expression data via latent Gaussian mixture models. Bioinformatics 26(21), 2705–2712.
McNicholas, P. D., T. B. Murphy, A. F. McDaid, and D. Frost (2010). Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Computational Statistics and Data Analysis 54 (3), 711–723.
Meng, X.-L. and D. B. Rubin (1993). Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267–278.
Meng, X.-L. and D. van Dyk (1997). The EM algorithm — an old folk song sung to a fast new tune (with discussion). Journal of the Royal Statistical Society: Series B 59(3), 511–567.
Oliver, G. and T. Colicchio (2011). The Oxford Companion to Beer. Oxford University Press.
Press, G. (2013). A very short history of data science. Sourced from www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/
.
Puts, M., P. Daas, and T. de Waal (2015). Finding errors in big data. Significance 12(3), 26–29.
R Core Team (2018). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.
Ridgeway, G. (2017). gbm: Generalized Boosted Regression Models. With contributions from others. R package version 2.1.3.
Ruppert, D., M. P. Wand, and R. J. Carroll (2003). Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press.
Schutt, R. (2013). Doing Data Science. Sebastopol, California: O’Reilly Media, Inc.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464.
Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology 15, 72–101.
Spearman, C. (1927). The Abilities of Man: Their Nature and Measurement. London: MacMillan and Co., Limited.
Streuli, H. (1973). Der heutige stand der kaffeechemie. In Association Scientifique International du Cafe, 6th International Colloquium on Coffee Chemistry, Bogatá, Colombia, pp. 61–72.
Tipping, M. E. and C. M. Bishop (1999a). Mixtures of probabilistic principal component analysers. Neural Computation 11 (2), 443–482.
Tipping, M. E. and C. M. Bishop (1999b). Probabilistic principal component analysis. Journal of the Royal Statistical Society. Series B 61, 611–622.
Tukey, J. W. (1962). The future of data analysis. The Annals of Mathematical Statistics 33(1), 1–67.
Tukey, J. W. (1977). Exploratory Data Analysis. Reading, Massachusetts: Addison-Wesley.
van Rossum, G. (1995). Python reference manual. Centrum voor Wiskunde en Informatica (CWI) Report CS-R9525. CWI: Amsterdam, The Netherlands.
Venables, W. N. and B. D. Ripley (2002). Modern Applied Statistics with S (4th ed.). New York: Springer.
White, T. (2015). Hadoop: The Definitive Guide (4th ed.). Sebastopol, California: O’Reilly Media, Inc.
Wickham, H. (2009). ggplot2: Elegant Graphics for Data Analysis. New York: Springer.
Wickham, H. (2011). The split-apply-combine strategy for data analysis. Journal of Statistical Software 40(1), 1–29.
Wickham, H. (2016). plyr: Tools for Splitting, Applying and Combining Data. R package version 1.8.4.
Wickham, H., R. Francois, L. Henry, and K. Müller (2017). dplyr: A Grammar of Data Manipulation. R package version 0.7.4.
Wilkinson, L. (2005). The Grammar of Graphics (2nd ed.). New York: Springer-Verlag.
Woodbury, M. A. (1950). Inverting modified matrices. Statistical Research Group, Memorandum Report 42. Princeton, New Jersey: Princeton University.
Wright, M. N. and A. Ziegler (2017). ranger: A fast implementation of random forests for high dimensional data in C++ and R. Journal of Statistical Software 77(1), 1–17.
Zuras, D., M. Cowlishaw, A. Aiken, M. Applegate, D. Bailey, S. Bass, D. Bhandarkar, M. Bhat, D. Bindel, S. Boldo, et al. (2008). IEEE standard for floating-point arithmetic. IEEE Std 754-2008, 1–70.