Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Clojure Data Analysis Cookbook Second Edition
Table of Contents
Clojure Data Analysis Cookbook Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Importing Data for Analysis
Introduction
Creating a new project
Getting ready
How to do it...
How it works...
Reading CSV data into Incanter datasets
Getting ready
How to do it…
How it works…
There's more…
Reading JSON data into Incanter datasets
Getting ready
How to do it…
How it works…
Reading data from Excel with Incanter
Getting ready
How to do it…
How it works…
Reading data from JDBC databases
Getting ready
How to do it…
How it works…
See also
Reading XML data into Incanter datasets
Getting ready
How to do it…
How it works…
There's more…
Navigating structures with zippers
Processing in a pipeline
Comparing XML and JSON
Scraping data from tables in web pages
Getting ready
How to do it…
How it works…
See also
Scraping textual data from web pages
Getting ready
How to do it…
How it works…
Reading RDF data
Getting ready
How to do it…
How it works…
See also
Querying RDF data with SPARQL
Getting ready
How to do it…
How it works…
There's more…
Aggregating data from different formats
Getting ready
How to do it…
Creating the triple store
Scraping exchange rates
Loading currency data and tying it all together
How it works…
See also
2. Cleaning and Validating Data
Introduction
Cleaning data with regular expressions
Getting ready
How to do it…
How it works…
There's more...
See also
Maintaining consistency with synonym maps
Getting ready
How to do it…
How it works…
See also
Identifying and removing duplicate data
Getting ready
How to do it…
How it works…
There's more…
Regularizing numbers
Getting ready
How to do it…
How it works…
Calculating relative values
Getting ready
How to do it…
How it works…
Parsing dates and times
Getting ready
How to do it…
There's more…
Lazily processing very large data sets
Getting ready
How to do it…
How it works…
Sampling from very large data sets
Getting ready
How to do it…
Sampling by percentage
Sampling exactly
How it works…
Fixing spelling errors
Getting ready
How to do it…
How it works…
There's more…
Parsing custom data formats
Getting ready
How to do it…
How it works…
Validating data with Valip
Getting ready
How to do it…
How it works…
3. Managing Complexity with Concurrent Programming
Introduction
Managing program complexity with STM
Getting ready
How to do it…
How it works…
See also
Managing program complexity with agents
Getting ready
How to do it…
How it works…
See also
Getting better performance with commute
Getting ready
How to do it…
How it works…
Combining agents and STM
Getting ready
How to do it…
How it works…
Maintaining consistency with ensure
Getting ready
How to do it…
How it works…
Introducing safe side effects into the STM
Getting ready
How to do it…
Maintaining data consistency with validators
Getting ready
How to do it…
How it works…
See also
Monitoring processing with watchers
Getting ready
How to do it…
How it works…
Debugging concurrent programs with watchers
Getting ready
How to do it…
There's more...
Recovering from errors in agents
How to do it…
Failing on errors
Continuing on errors
Using a custom error handler
There's more...
Managing large inputs with sized queues
How to do it…
How it works...
4. Improving Performance with Parallel Programming
Introduction
Parallelizing processing with pmap
How to do it…
How it works…
There's more…
See also
Parallelizing processing with Incanter
Getting ready
How to do it…
How it works…
Partitioning Monte Carlo simulations for better pmap performance
Getting ready
How to do it…
How it works…
Estimating with Monte Carlo simulations
Chunking data for pmap
Finding the optimal partition size with simulated annealing
Getting ready
How to do it…
How it works…
There's more…
Combining function calls with reducers
Getting ready
How to do it…
What happened here?
There's more...
See also
Parallelizing with reducers
Getting ready
How to do it…
How it works…
See also
Generating online summary statistics for data streams with reducers
Getting ready
How to do it…
Using type hints
Getting ready
How to do it…
How it works…
See also
Benchmarking with Criterium
Getting ready
How to do it…
How it works…
See also
5. Distributed Data Processing with Cascalog
Introduction
Initializing Cascalog and Hadoop for distributed processing
Getting ready
How to do it…
How it works…
See also
Querying data with Cascalog
Getting ready
How to do it…
How it works…
There's more
Distributing data with Apache HDFS
Getting ready
How to do it…
How it works…
Parsing CSV files with Cascalog
Getting ready
How to do it…
How it works…
There's more
Executing complex queries with Cascalog
Getting ready
How to do it…
Aggregating data with Cascalog
Getting ready
How to do it…
There's more
Defining new Cascalog operators
Getting ready
How to do it…
Creating map operators
Creating map concatenation operators
Creating filter operators
Creating buffer operators
Creating aggregate operators
Creating parallel aggregate operators
Composing Cascalog queries
Getting ready
How to do it…
How it works…
Transforming data with Cascalog
Getting ready
How to do it…
How it works…
6. Working with Incanter Datasets
Introduction
Loading Incanter's sample datasets
Getting ready
How to do it…
How it works…
There's more...
Loading Clojure data structures into datasets
Getting ready
How to do it…
How it works…
See also…
Viewing datasets interactively with view
Getting ready
How to do it…
How it works…
See also…
Converting datasets to matrices
Getting ready
How to do it…
How it works…
There's more…
See also…
Using infix formulas in Incanter
Getting ready
How to do it…
How it works…
Selecting columns with $
Getting ready
How to do it…
How it works…
There's more…
See also…
Selecting rows with $
Getting ready
How to do it…
How it works…
Filtering datasets with $where
Getting ready
How to do it…
How it works…
There's more…
Grouping data with $group-by
Getting ready
How to do it…
How it works…
Saving datasets to CSV and JSON
Getting ready
How to do it…
Saving data as CSV
Saving data as JSON
How it works…
See also…
Projecting from multiple datasets with $join
Getting ready
How to do it…
How it works…
7. Statistical Data Analysis with Incanter
Introduction
Generating summary statistics with $rollup
Getting ready
How to do it…
How it works…
Working with changes in values
Getting ready
How to do it…
How it works…
Scaling variables to simplify variable relationships
Getting ready
How to do it…
How it works…
Working with time series data with Incanter Zoo
Getting ready
How to do it…
There's more...
Smoothing variables to decrease variation
Getting ready
How to do it…
How it works…
Validating sample statistics with bootstrapping
Getting ready
How to do it…
How it works…
There's more…
Modeling linear relationships
Getting ready
How to do it…
How it works…
Modeling non-linear relationships
Getting ready
How to do it…
How it works...
Modeling multinomial Bayesian distributions
Getting ready
How to do it…
How it works…
There's more...
Finding data errors with Benford's law
Getting ready
How to do it…
How it works…
There's more…
8. Working with Mathematica and R
Introduction
Setting up Mathematica to talk to Clojuratica for Mac OS X and Linux
Getting ready
How to do it…
How it works…
There's more…
Setting up Mathematica to talk to Clojuratica for Windows
Getting ready
How to do it...
How it works...
Calling Mathematica functions from Clojuratica
Getting ready
How to do it…
How it works…
Sending matrixes to Mathematica from Clojuratica
Getting ready
How to do it…
How it works…
Evaluating Mathematica scripts from Clojuratica
Getting ready
How to do it…
How it works…
Creating functions from Mathematica
Getting ready
How to do it…
How it works…
Setting up R to talk to Clojure
Getting ready
How to do it…
Setting up R
Setting up Clojure
How it works…
Calling R functions from Clojure
Getting ready
How to do it…
How it works…
There's more…
Passing vectors into R
Getting ready
How to do it…
How it works…
Evaluating R files from Clojure
Getting ready
How to do it…
How it works…
There's more…
Plotting in R from Clojure
Getting ready
How to do it…
How it works…
There's more…
9. Clustering, Classifying, and Working with Weka
Introduction
Loading CSV and ARFF files into Weka
Getting ready
How to do it…
How it works…
There's more…
See also…
Filtering, renaming, and deleting columns in Weka datasets
Getting ready
How to do it…
Renaming columns
Removing columns
Hiding columns
How it works…
Discovering groups of data using K-Means clustering
Getting ready
How to do it…
How it works…
Clustering with K-Means
Analyzing the results
Building macros
See also…
Finding hierarchical clusters in Weka
Getting ready
How to do it…
How it works…
There's more…
Clustering with SOMs in Incanter
Getting ready
How to do it…
How it works…
There's more…
Classifying data with decision trees
Getting ready
How to do it…
How it works…
There's more…
Classifying data with the Naive Bayesian classifier
Getting ready
How to do it…
How it works…
There's more…
Classifying data with support vector machines
Getting ready
How to do it…
There's more…
Finding associations in data with the Apriori algorithm
Getting ready
How to do it…
How it works…
There's more…
10. Working with Unstructured and Textual Data
Introduction
Tokenizing text
Getting ready
How to do it…
How it works…
Finding sentences
Getting ready
How to do it…
How it works…
Focusing on content words with stoplists
Getting ready
How to do it…
Getting document frequencies
Getting ready
How to do it…
Scaling document frequencies by document size
Getting ready
How to do it…
How it works…
Scaling document frequencies with TF-IDF
Getting ready
How to do it…
How it works…
Finding people, places, and things with Named Entity Recognition
Getting ready
How to do it…
How it works…
Mapping documents to a sparse vector space representation
Getting ready…
How to do it…
Performing topic modeling with MALLET
Getting ready
How to do it…
How it works…
See also…
Performing naïve Bayesian classification with MALLET
Getting ready
How to do it…
How it works…
There's more…
See also…
11. Graphing in Incanter
Introduction
Creating scatter plots with Incanter
Getting ready
How to do it...
How it works...
There's more...
See also
Graphing non-numeric data in bar charts
Getting ready
How to do it...
How it works...
Creating histograms with Incanter
Getting ready
How to do it...
How it works...
Creating function plots with Incanter
Getting ready
How to do it...
How it works...
See also
Adding equations to Incanter charts
Getting ready
How to do it...
There's more...
Adding lines to scatter charts
Getting ready
How to do it...
How it works...
See also
Customizing charts with JFreeChart
Getting ready
How to do it...
How it works...
See also
Customizing chart colors and styles
Getting ready
How to do it...
Saving Incanter graphs to PNG
Getting ready
How to do it...
How it works...
Using PCA to graph multi-dimensional data
Getting ready
How to do it...
How it works...
There's more...
Creating dynamic charts with Incanter
Getting ready
How to do it...
How it works...
12. Creating Charts for the Web
Introduction
Serving data with Ring and Compojure
Getting ready
How to do it…
Configuring and setting up the web application
Serving data
Defining routes and handlers
Running the server
How it works…
There's more…
Creating HTML with Hiccup
Getting ready
How to do it…
How it works…
There's more…
Setting up to use ClojureScript
Getting ready
How to do it…
How it works…
There's more…
Creating scatter plots with NVD3
Getting ready
How to do it…
How it works…
There's more…
Creating bar charts with NVD3
Getting ready
How to do it…
How it works…
Creating histograms with NVD3
Getting ready
How to do it…
How it works…
Creating time series charts with D3
Getting ready
How to do it…
How it works…
There's more…
Visualizing graphs with force-directed layouts
Getting ready
How to do it…
How it works…
There's more…
Creating interactive visualizations with D3
Getting ready
How to do it…
How it works…
There's more…
Index
← Prev
Back
Next →
← Prev
Back
Next →