Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Java Data Analysis
Table of Contents
Java Data Analysis
Credits
About the Author
About the Reviewers
www.PacktPub.com
eBooks, discount offers, and more
Why subscribe?
Customer Feedback
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Introduction to Data Analysis
Origins of data analysis
The scientific method
Actuarial science
Calculated by steam
A spectacular example
Herman Hollerith
ENIAC
VisiCalc
Data, information, and knowledge
Why Java?
Java Integrated Development Environments
Summary
2. Data Preprocessing
Data types
Variables
Data points and datasets
Null values
Relational database tables
Key fields
Key-value pairs
Hash tables
File formats
Microsoft Excel data
XML and JSON data
Generating test datasets
Metadata
Data cleaning
Data scaling
Data filtering
Sorting
Merging
Hashing
Summary
3. Data Visualization
Tables and graphs
Scatter plots
Line graphs
Bar charts
Histograms
Time series
Java implementation
Moving average
Data ranking
Frequency distributions
The normal distribution
A thought experiment
The exponential distribution
Java example
Summary
4. Statistics
Descriptive statistics
Random sampling
Random variables
Probability distributions
Cumulative distributions
The binomial distribution
Multivariate distributions
Conditional probability
The independence of probabilistic events
Contingency tables
Bayes' theorem
Covariance and correlation
The standard normal distribution
The central limit theorem
Confidence intervals
Hypothesis testing
Summary
5. Relational Databases
The relation data model
Relational databases
Foreign keys
Relational database design
Creating a database
SQL commands
Inserting data into the database
Database queries
SQL data types
JDBC
Using a JDBC PreparedStatement
Batch processing
Database views
Subqueries
Table indexes
Summary
6. Regression Analysis
Linear regression
Linear regression in Excel
Computing the regression coefficients
Variation statistics
Java implementation of linear regression
Anscombe's quartet
Polynomial regression
Multiple linear regression
The Apache Commons implementation
Curve fitting
Summary
7. Classification Analysis
Decision trees
What does entropy have to do with it?
The ID3 algorithm
Java Implementation of the ID3 algorithm
The Weka platform
The ARFF filetype for data
Java implementation with Weka
Bayesian classifiers
Java implementation with Weka
Support vector machine algorithms
Logistic regression
K-Nearest Neighbors
Fuzzy classification algorithms
Summary
8. Cluster Analysis
Measuring distances
The curse of dimensionality
Hierarchical clustering
Weka implementation
K-means clustering
K-medoids clustering
Affinity propagation clustering
Summary
9. Recommender Systems
Utility matrices
Similarity measures
Cosine similarity
A simple recommender system
Amazon's item-to-item collaborative filtering recommender
Implementing user ratings
Large sparse matrices
Using random access files
The Netflix prize
Summary
10. NoSQL Databases
The Map data structure
SQL versus NoSQL
The Mongo database system
The Library database
Java development with MongoDB
The MongoDB extension for geospatial databases
Indexing in MongoDB
Why NoSQL and why MongoDB?
Other NoSQL database systems
Summary
11. Big Data Analysis with Java
Scaling, data striping, and sharding
Google's PageRank algorithm
Google's MapReduce framework
Some examples of MapReduce applications
The WordCount example
Scalability
Matrix multiplication with MapReduce
MapReduce in MongoDB
Apache Hadoop
Hadoop MapReduce
Summary
A. Java Tools
The command line
Java
NetBeans
MySQL
MySQL Workbench
Accessing the MySQL database from NetBeans
The Apache Commons Math Library
The javax JSON Library
The Weka libraries
MongoDB
Index
← Prev
Back
Next →
← Prev
Back
Next →