This book discusses and illustrates Open Source Software (OSS) for statistical analysis of Big Data. Chapter 1 overviews OSS in the public domain and discuses key characteristics of Big Data. Then chapter 2 introduces OSS for Big Data and chapter 3 overviews popular Open Source Statistical Software (OSSS). Two most popular OSSS: R and Python, are discussed in depth with applications. Chapter 4 applies cluster analysis in R with Big Data applications and Chapter 5 applies generalized linear model in R with automobile fatality rate prediction application. Chapter 6 further discusses Python and its statistical application and Chapter 7 applies machine learning algorithm in Python. Figure 1 shows the book chapter relationship, Figure 2 lists the key figures and tables within each chapter.
Figure 1. Book chapter relationship |
---|
Figure 2. Key tables within each chapter |
---|
This book will be best used as a reference book that introduces basic Open Source Software for statistical analysis of Big Data. It introduces the concepts with working platform demonstrations of up-to-date software, functionality descriptions and practical applications. Once a reader has a preliminary understanding of the capacity and capabilities of each software, further investigations would be required such as more in-depth study of coding, functionality and other software characteristics to utilize the software to solve practical problems in multiple disciplines using Big Data.