Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Practical Data Analysis - Second Edition
Practical Data Analysis - Second Edition Credits About the Authors About the Reviewers www.PacktPub.com
eBooks, discount offers, and more
Why subscribe? Free access for Packt account holders
Preface
What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support
Downloading the example code Downloading the color images of this book Errata Piracy Questions
1. Getting Started
Computer science Artificial intelligence Machine learning Statistics Mathematics Knowledge domain Data, information, and knowledge
Inter-relationship between data, information, and knowledge The nature of data
The data analysis process
The problem Data preparation Data exploration Predictive modeling Visualization of results
Quantitative versus qualitative data analysis Importance of data visualization What about big data? Quantified self
Sensors and cameras Social network analysis
Tools and toys for this book
Why Python? Why mlpy? Why D3.js? Why MongoDB?
Summary
2. Preprocessing Data
Data sources
Open data Text files Excel files SQL databases NoSQL databases Multimedia Web scraping
Data scrubbing
Statistical methods Text parsing Data transformation
Data formats
Parsing a CSV file with the CSV module
Parsing CSV file using NumPy
JSON
Parsing JSON file using the JSON module
XML
Parsing XML in Python using the XML module
YAML
Data reduction methods
Filtering and sampling Binned algorithm Dimensionality reduction
Getting started with OpenRefine
Text facet Clustering Text filters Numeric facets Transforming data Exporting data Operation history
Summary
3. Getting to Grips with Visualization
What is visualization? Working with web-based visualization Exploring scientific visualization Visualization in art The visualization life cycle Visualizing different types of data
HTML DOM CSS JavaScript SVG
Getting started with D3.js
Bar chart Pie chart Scatter plots Single line chart Multiple line chart
Interaction and animation Data from social networks An overview of visual analytics Summary
4. Text Classification
Learning and classification Bayesian classification
Naïve Bayes
E-mail subject line tester The data The algorithm Classifier accuracy Summary
5. Similarity-Based Image Retrieval
Image similarity search Dynamic time warping Processing the image dataset Implementing DTW Analyzing the results Summary
6. Simulation of Stock Prices
Financial time series Random Walk simulation Monte Carlo methods Generating random numbers Implementation in D3js Quantitative analyst Summary
7. Predicting Gold Prices
Working with time series data
Components of a time series
Smoothing time series Lineal regression The data - historical gold prices Nonlinear regressions
Kernel Ridge Regressions Smoothing the gold prices time series Predicting in the smoothed time series Contrasting the predicted value
Summary
8. Working with Support Vector Machines
Understanding the multivariate dataset Dimensionality reduction
Linear Discriminant Analysis (LDA) Principal Component Analysis (PCA)
Getting started with SVM
Kernel functions The double spiral problem SVM implemented on mlpy
Summary
9. Modeling Infectious Diseases with Cellular Automata
Introduction to epidemiology
The epidemiology triangle
The epidemic models
The SIR model Solving the ordinary differential equation for the SIR model with SciPy The SIRS model
Modeling with Cellular Automaton
Cell, state, grid, neighborhood Global stochastic contact model
Simulation of the SIRS model in CA with D3.js Summary
10. Working with Social Graphs
Structure of a graph
Undirected graph Directed graph
Social networks analysis Acquiring the Facebook graph Working with graphs using Gephi Statistical analysis
Male to female ratio
Degree distribution
Histogram of a graph Centrality
Transforming GDF to JSON Graph visualization with D3.js Summary
11. Working with Twitter Data
The anatomy of Twitter data
Tweet Followers Trending topics
Using OAuth to access Twitter API Getting started with Twython
Simple search using Twython Working with timelines Working with followers Working with places and trends Working with user data Streaming API
Summary
12. Data Processing and Aggregation with MongoDB
Getting started with MongoDB
Database Collection Document Mongo shell Insert/Update/Delete Queries
Data preparation
Data transformation with OpenRefine Inserting documents with PyMongo
Group Aggregation framework
Pipelines Expressions
Summary
13. Working with MapReduce
An overview of MapReduce Programming model Using MapReduce with MongoDB
Map function Reduce function Using mongo shell Using Jupyter Using PyMongo
Filtering the input collection Grouping and aggregation Counting the most common words in tweets Summary
14. Online Data Analysis with Jupyter and Wakari
Getting started with Wakari
Creating an account in Wakari
Getting started with IPython notebook
Data visualization
Introduction to image processing with PIL
Opening an image Working with an image histogram Filtering Operations Transformations
Getting started with pandas
Working with Time Series Working with multivariate datasets with DataFrame Grouping, Aggregation, and Correlation
Sharing your Notebook
The data
Summary
15. Understanding Data Processing using Apache Spark
Platform for data processing
The Cloudera platform Installing Cloudera VM
An introduction to the distributed file system
First steps with Hadoop Distributed File System - HDFS File management with HUE - web interface
An introduction to Apache Spark
The Spark ecosystem The Spark programming model An introductory working example of Apache Startup
Summary
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion