Index
A
actuarial science /
Actuarial science
affinity propagation clustering /
Affinity propagation clustering
aggregate function /
Subqueries
alternative hypothesis /
Hypothesis testing
Amazon Standard Identification Number (AISM) /
The Map data structure
American College Board Scholastic Aptitude Test in mathematics (AP math test) /
The standard normal distribution
Anscombe's quartet /
Anscombe's quartet
Apache Commons
download link /
The Apache Commons Math Library
Apache Commons implementation /
The Apache Commons implementation
Apache Commons Math Library
about /
The Apache Commons Math Library
using /
The Apache Commons Math Library
Apache Hadoop
about /
Apache Hadoop
Apache Software Foundation
about /
The Apache Commons Math Library
references, for projects /
The Apache Commons Math Library
ARFF filetype /
The ARFF filetype for data
associative array /
The Map data structure
Attribute-Relation File Format (ARFF)
about /
The ARFF filetype for data
reference /
The ARFF filetype for data
attribute-value pairs /
Key-value pairs
attributes /
Data points and datasets
B
B-tree /
Table indexes
backward elimination
about /
K-Nearest Neighbors
bar chart
about /
Bar charts
generating /
Bar charts
batch mode /
Inserting data into the database
batch processing /
Batch processing
Bayes' theorem /
Bayes' theorem
Bayesian classifiers
about /
Bayesian classifiers
Java implementation, with Weka /
Java implementation with Weka
support vector machine algorithms /
Support vector machine algorithms
Binary Search algorithm
about /
Decision trees
binomial coefficient /
The binomial distribution
binomial distribution /
The binomial distribution
bins
about /
Histograms
BSON object /
Java development with MongoDB
bulk write /
The Mongo database system
C
95% confidence coefficient /
Confidence intervals
calculated by steam /
Calculated by steam
Canberra metric /
Measuring distances
Cartesian plane /
Measuring distances
Cartesian product /
The relation data model
Cassandra /
Other NoSQL database systems
central limit theorem /
The central limit theorem
Chebyshev metric /
Measuring distances
chessboard metric /
Measuring distances
column data model /
Other NoSQL database systems
column store /
The Mongo database system
comma separated values (CSV)
about /
File formats
Commons Math API
reference /
Descriptive statistics
complexity analysis /
Hierarchical clustering
compound indexes /
Indexing in MongoDB
conditional probability /
Conditional probability
confidence intervals /
Confidence intervals
content-based recommendation /
Utility matrices
contingency tables /
Contingency tables
continuous distribution /
The standard normal distribution
correlation coefficient /
Covariance and correlation
cosine similarity /
Cosine similarity
covariance /
Covariance and correlation
crosstab tables /
Contingency tables
cumulative distribution function (CDF) /
The exponential distribution
,
Cumulative distributions
curse of dimensionality /
The curse of dimensionality
curve fitting /
Curve fitting
D
data /
Data, information, and knowledge
inserting, into database /
Inserting data into the database
data analysis
origins /
Origins of data analysis
database
creating /
Creating a database
data, inserting into /
Inserting data into the database
database driver /
JDBC
database query /
Database queries
database schema
creating, in MySQL /
MySQL Workbench
database views /
Database views
data cleaning /
Data cleaning
Data Definition Language (DDL) /
SQL commands
data filtering /
Data filtering
Data Manipulation Language (DML) /
SQL commands
data normalization /
Data scaling
data points /
Data points and datasets
data ranking /
Data ranking
data scaling /
Data scaling
data scrubbing /
Data cleaning
datasets /
Data points and datasets
data striping /
Scaling, data striping, and sharding
data types /
Data types
decile
about /
Descriptive statistics
decision tree
about /
Decision trees
density function /
The normal distribution
descriptive statistic
about /
Descriptive statistics
Development Environment (IDE) /
Java Integrated Development Environments
dictionary /
The Map data structure
discrete distribution /
The standard normal distribution
distances
measuring /
Measuring distances
divide and conquer /
Google's MapReduce framework
document data model /
Other NoSQL database systems
document store /
The Mongo database system
domain /
The relation data model
dynamic schemas /
Java development with MongoDB
E
Electronic Numerical Integrator and Computer (ENIAC) /
ENIAC
entropy
definingTopicnabout /
What does entropy have to do with it?
event
about /
Random sampling
Excel
linear regression, performing in /
Linear regression in Excel
exemplars /
Affinity propagation clustering
explained variation /
Variation statistics
exponential distribution /
The exponential distribution
Extensible Markup Language (XML) /
XML and JSON data
F
false negative /
Bayes' theorem
false positive /
Bayes' theorem
fields /
Data points and datasets
,
The relation data model
file formats
about /
File formats
foreign key /
Foreign keys
frequency distribution /
Frequency distributions
fuzzy classification algorithms
about /
Fuzzy classification algorithms
G
Generalized Markup Language (GML) /
XML and JSON data
GeoJSON object types /
The MongoDB extension for geospatial databases
geospatial databases
reference /
The MongoDB extension for geospatial databases
graph data model /
Other NoSQL database systems
graphs /
Tables and graphs
H
Hadoop Common /
Apache Hadoop
Hadoop Distributed File System (HDFS) /
Apache Hadoop
Hadoop MapReduce /
Apache Hadoop
about /
Hadoop MapReduce
WordCount program /
Hadoop MapReduce
reference, for example /
Hadoop MapReduce
Hadoop YARN /
Apache Hadoop
Hamming distance /
Similarity measures
Hamming similarity /
Similarity measures
hash /
Hashing
hash codes /
Hashing
hash function
properties /
Hashing
hashing /
Hashing
hash table /
Hash tables
,
Large sparse matrices
HBase /
Other NoSQL database systems
Herman Hollerith /
Herman Hollerith
hierarchical clustering
about /
Hierarchical clustering
Weka implementation /
Weka implementation
K-means clustering /
K-means clustering
k-medoids clustering /
K-medoids clustering
affinity propagation clustering /
Affinity propagation clustering
histogram
about /
Histograms
horizontal scaling /
Scaling, data striping, and sharding
hypothesis testing /
Hypothesis testing
I
ID3 algorithm
about /
The ID3 algorithm
Java implementation /
Java Implementation of the ID3 algorithm
Java implementation, with Weka /
Java implementation with Weka
indexing
in MongoDB /
Indexing in MongoDB
information /
Data, information, and knowledge
instance /
The relation data model
International Business Machines Corporation (IBM) /
Herman Hollerith
International Standard Book Numbers /
The Map data structure
item-based recommendation /
Utility matrices
item-to-item collaborative filtering recommender /
Amazon's item-to-item collaborative filtering recommender
Iterative Dichotomizer 3
about /
The ID3 algorithm
J
Java
about /
Why Java?
features /
Why Java?
Java Database Connectivity (JDBC) /
JDBC
Java DB /
Creating a database
Java development
with MongoDB /
Java development with MongoDB
Java implementation
example /
Java implementation
of linear regression /
Java implementation of linear regression
of ID3 algorithm /
Java Implementation of the ID3 algorithm
Java Integrated Development Environments /
Java Integrated Development Environments
JavaScript Object Notation (JSON) /
XML and JSON data
javax.json library
about /
The javax JSON Library
JDBC PreparedStatement
using /
Using a JDBC PreparedStatement
joint probability function /
Multivariate distributions
JSON (JavaScript Object Notation) /
The Mongo database system
JSON data /
XML and JSON data
JSON event types
identifying /
XML and JSON data
JSON files
parsing /
XML and JSON data
K
K-Means++ algorithm /
K-means clustering
K-means clustering /
K-means clustering
k-medoids clustering /
K-medoids clustering
K-Nearest Neighbor (KNN) /
K-means clustering
K-Nearest Neighbors
about /
K-Nearest Neighbors
key-value data model /
Other NoSQL database systems
key-value pairs (KVP) /
Key-value pairs
key field /
Key fields
key values /
Key fields
knowledge /
Data, information, and knowledge
kurtosis
about /
Descriptive statistics
L
large sparse matrices /
Large sparse matrices
least-squares parabola
about /
Polynomial regression
level of significance /
Hypothesis testing
lexicographic order /
Large sparse matrices
Library database /
The Library database
linear regression
about /
Linear regression
in Excel /
Linear regression in Excel
Java implementation /
Java implementation of linear regression
Anscombe's quartet /
Anscombe's quartet
line graph
about /
Line graphs
generating /
Line graphs
logarithmic time /
Table indexes
logistic function
about /
Logistic regression
logistic regression
about /
Logistic regression
example /
Logistic regression
K-Nearest Neighbors /
K-Nearest Neighbors
logit function /
Logistic regression
LU decomposition
about /
Polynomial regression
M
Manhattan metric /
Measuring distances
map /
Large sparse matrices
Map data structure /
The Map data structure
MapReduce
matrix multiplication, implementing /
Matrix multiplication with MapReduce
in MongoDB /
MapReduce in MongoDB
MapReduce applications
examples /
Some examples of MapReduce applications
MapReduce framework /
Google's MapReduce framework
marginal probabilities /
Multivariate distributions
Markov chain /
Google's PageRank algorithm
matrix multiplication
with MapReduce /
Matrix multiplication with MapReduce
maximum
about /
Descriptive statistics
mean average
about /
Descriptive statistics
median
about /
Descriptive statistics
merging /
Merging
message-passing /
Affinity propagation clustering
metadata /
Metadata
method of least squares
about /
Polynomial regression
metric /
Measuring distances
metric space /
Measuring distances
Microsoft Excel
moving average, computing /
Moving average
Microsoft Excel data /
Microsoft Excel data
minimal spanning tree (MST) /
Some examples of MapReduce applications
minimum
about /
Descriptive statistics
Minkowski metric /
Measuring distances
mode
about /
Descriptive statistics
mongo-java-driver JAR files
download link /
Java development with MongoDB
Mongo database system /
The Mongo database system
MongoDB
download link /
The Mongo database system
references /
The Mongo database system
indexing /
Indexing in MongoDB
need for /
Why NoSQL and why MongoDB?
about /
MongoDB
MongoDB extension
for geospatial databases /
The MongoDB extension for geospatial databases
MongoDB installation file
download link /
MongoDB
MongoDB Manual
reference /
The Mongo database system
moving average /
Moving average
computing, in Microsoft Excel /
Moving average
MovingAverage class
test program /
Moving average
moving average series /
Moving average
multilinear functions /
Multiple linear regression
multiple linear regression /
Multiple linear regression
multivariate distributions /
Multivariate distributions
multivariate probability distribution function /
Multivariate distributions
MySQL
database schema, creating /
MySQL Workbench
MySQL database
accessing, from NetBeans /
Accessing the MySQL database from NetBeans
N
naive Bayes classification algorithm
about /
Bayesian classifiers
Neoj4 /
Other NoSQL database systems
NetBeans
MySQL database, accessing from /
Accessing the MySQL database from NetBeans
Netflix prize /
The Netflix prize
normal distribution
about /
The normal distribution
example /
A thought experiment
normal equations /
Computing the regression coefficients
NoSQL
versus SQL /
SQL versus NoSQL
need for /
Why NoSQL and why MongoDB?
NoSQL database systems
reference /
Other NoSQL database systems
about /
Other NoSQL database systems
null hypothesis /
Hypothesis testing
null values /
Null values
O
offset /
Using random access files
ordinary least squares (OLS)
about /
The Apache Commons implementation
outlier /
Anscombe's quartet
P
PageRank algorithm /
Google's PageRank algorithm
parser /
XML and JSON data
parsing /
XML and JSON data
partitioning around medoids (PAM) /
K-medoids clustering
percentile
about /
Descriptive statistics
POI open source API library
download link /
Microsoft Excel data
polynomial regression
about /
Polynomial regression
population
about /
Random sampling
primary key /
The relation data model
probabilistic events
independence /
The independence of probabilistic events
probabilities
facts /
Random sampling
probability density function (PDF) /
A thought experiment
probability distribution function (PDF) /
Probability distributions
probability function
about /
Random sampling
probability set function
about /
Random sampling
Pythagorean theorem /
Measuring distances
Q
quality control department (QCD) /
Confidence intervals
quartiles
about /
Descriptive statistics
R
random access files
using /
Using random access files
random experiment
about /
Random sampling
random sample
about /
Random sampling
random sampling
about /
Random sampling
random variable /
Random variables
range
about /
Descriptive statistics
red-black tree data structure /
Large sparse matrices
Redis /
Other NoSQL database systems
regression
about /
Linear regression
regression coefficients
computing /
Computing the regression coefficients
relation /
The relation data model
relational database (Rdb) /
Java development with MongoDB
relational database (RDB) /
The relation data model
,
Relational databases
relational database design
about /
Relational database design
database, creating /
Creating a database
SQL commands /
SQL commands
data, inserting into database /
Inserting data into the database
database queries /
Database queries
SQL data types /
SQL data types
Java Database Connectivity (JDBC) /
JDBC
JDBC PreparedStatement, using /
Using a JDBC PreparedStatement
batch processing /
Batch processing
database views /
Database views
subqueries /
Subqueries
table indexes /
Table indexes
relational databases (Rdbs) /
The Mongo database system
relational database system (RDBMS) /
Creating a database
Relational database systems (Rdbs) /
Scaling, data striping, and sharding
relational database tables /
Relational database tables
relation data model /
The relation data model
residual /
Linear regression in Excel
rows /
The relation data model
running average /
Moving average
S
sample /
Frequency distributions
sample correlation coefficient /
Linear regression in Excel
sample space
about /
Random sampling
sample variance /
Descriptive statistics
scalability /
Scalability
scaling /
Scaling, data striping, and sharding
scatter plot
about /
Scatter plots
generating /
Scatter plots
schema /
The relation data model
,
Relational databases
scientific method /
The scientific method
sharding /
Scaling, data striping, and sharding
shards /
Scaling, data striping, and sharding
show collections command
collections /
The Mongo database system
sigmoid curve
about /
Logistic regression
similarity measure /
Utility matrices
,
Similarity measures
simple average
about /
Descriptive statistics
simple recommender system /
A simple recommender system
skewness
about /
Descriptive statistics
SN 1572 /
The scientific method
sorting /
Sorting
sparse matrix /
Large sparse matrices
,
Google's PageRank algorithm
sparse matrix format /
Matrix multiplication with MapReduce
spectacular example /
A spectacular example
SQL
versus NoSQL /
SQL versus NoSQL
SQL (Structured Query Language) /
Creating a database
SQL commands /
SQL commands
SQL data types /
SQL data types
SQL script /
SQL commands
Standard Generalized Markup Language (SGML) /
XML and JSON data
standard normal distribution /
The standard normal distribution
standard normal distribution /
The standard normal distribution
statement object /
Using a JDBC PreparedStatement
statistics
descriptive statistics /
Descriptive statistics
subquery /
Subqueries
support vector machine (SVM) /
Support vector machine algorithms
support vector machine algorithms /
Support vector machine algorithms
T
table indexes /
Table indexes
tables /
Tables and graphs
taxicab metric /
Measuring distances
test datasets
generating /
Generating test datasets
time series
about /
Time series
simulating /
Java example
TimeSeries class
test program /
Java implementation
total variation /
Variation statistics
transition matrix /
Google's PageRank algorithm
triangle inequality /
Measuring distances
tuples /
The relation data model
two-tailed test /
Hypothesis testing
Type I error /
Bayes' theorem
,
Hypothesis testing
Type II error /
Bayes' theorem
type signature /
Data points and datasets
U
unexplained variation /
Variation statistics
Universal Product Codes (UPCs) /
The Map data structure
user ratings
implementing /
Implementing user ratings
utility matrix /
Utility matrices
V
variables /
Variables
variation statistics /
Variation statistics
vehicle identification number (VIN) /
The Map data structure
vertical scaling /
Scaling, data striping, and sharding
virtual table /
Database views
VisiCalc /
VisiCalc
W
weighted mean
about /
Descriptive statistics
Weka
about /
The Weka platform
download link /
The Weka libraries
Weka implementation /
Weka implementation
Weka libraries
about /
The Weka libraries
Weka Workbench
reference /
The Weka platform
WordCount example /
The WordCount example
WordCount problem /
Some examples of MapReduce applications
X
XML data /
XML and JSON data