Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Python: Data Analytics and Visualization
Table of Contents Python: Data Analytics and Visualization Credits Preface
What this learning path covers What you need for this learning path Who this learning path is for Reader feedback Customer support
Downloading the example code Errata Piracy Questions
1. Module 1
1. Introducing Data Analysis and Libraries
Data analysis and processing An overview of the libraries in data analysis Python libraries in data analysis
NumPy Pandas Matplotlib PyMongo The scikit-learn library
Summary
2. NumPy Arrays and Vectorized Computation
NumPy arrays
Data types Array creation Indexing and slicing Fancy indexing Numerical operations on arrays
Array functions Data processing using arrays
Loading and saving data Saving an array Loading an array
Linear algebra with NumPy NumPy random numbers Summary
3. Data Analysis with Pandas
An overview of the Pandas package The Pandas data structure
Series The DataFrame
The essential basic functionality
Reindexing and altering labels Head and tail Binary operations Functional statistics Function application Sorting
Indexing and selecting data Computational tools Working with missing data Advanced uses of Pandas for data analysis
Hierarchical indexing The Panel data
Summary
4. Data Visualization
The matplotlib API primer
Line properties Figures and subplots
Exploring plot types
Scatter plots Bar plots Contour plots Histogram plots
Legends and annotations Plotting functions with Pandas Additional Python data visualization tools
Bokeh MayaVi
Summary
5. Time Series
Time series primer Working with date and time objects Resampling time series Downsampling time series data Upsampling time series data Time zone handling Timedeltas Time series plotting Summary
6. Interacting with Databases
Interacting with data in text format
Reading data from text format Writing data to text format
Interacting with data in binary format
HDF5
Interacting with data in MongoDB Interacting with data in Redis
The simple value List Set Ordered set
Summary
7. Data Analysis Application Examples
Data munging
Cleaning data Filtering Merging data Reshaping data
Data aggregation Grouping data Summary
8. Machine Learning Models with scikit-learn
An overview of machine learning models The scikit-learn modules for different models Data representation in scikit-learn Supervised learning – classification and regression Unsupervised learning – clustering and dimensionality reduction Measuring prediction performance Summary
2. Module 2
1. Getting Started with Predictive Modelling
Introducing predictive modelling
Scope of predictive modelling
Ensemble of statistical algorithms Statistical tools Historical data Mathematical function Business context
Knowledge matrix for predictive modelling Task matrix for predictive modelling
Applications and examples of predictive modelling
LinkedIn's "People also viewed" feature
What it does? How is it done?
Correct targeting of online ads
How is it done?
Santa Cruz predictive policing
How is it done?
Determining the activity of a smartphone user using accelerometer data
How is it done?
Sport and fantasy leagues
How was it done?
Python and its packages – download and installation
Anaconda Standalone Python Installing a Python package
Installing pip Installing Python packages with pip
Python and its packages for predictive modelling IDEs for Python Summary
2. Data Cleaning
Reading the data – variations and examples
Data frames Delimiters
Various methods of importing data in Python
Case 1 – reading a dataset using the read_csv method
The read_csv method Use cases of the read_csv method
Passing the directory address and filename as variables Reading a .txt dataset with a comma delimiter Specifying the column names of a dataset from a list
Case 2 – reading a dataset using the open method of Python
Reading a dataset line by line Changing the delimiter of a dataset
Case 3 – reading data from a URL Case 4 – miscellaneous cases
Reading from an .xls or .xlsx file Writing to a CSV or Excel file
Basics – summary, dimensions, and structure Handling missing values
Checking for missing values What constitutes missing data?
How missing values are generated and propagated
Treating missing values
Deletion Imputation
Creating dummy variables Visualizing a dataset by basic plotting
Scatter plots Histograms Boxplots
Summary
3. Data Wrangling
Subsetting a dataset
Selecting columns Selecting rows Selecting a combination of rows and columns Creating new columns
Generating random numbers and their usage
Various methods for generating random numbers Seeding a random number Generating random numbers following probability distributions
Probability density function Cumulative density function Uniform distribution Normal distribution
Using the Monte-Carlo simulation to find the value of pi
Geometry and mathematics behind the calculation of pi
Generating a dummy data frame
Grouping the data – aggregation, filtering, and transformation
Aggregation Filtering Transformation Miscellaneous operations
Random sampling – splitting a dataset in training and testing datasets
Method 1 – using the Customer Churn Model Method 2 – using sklearn Method 3 – using the shuffle function
Concatenating and appending data Merging/joining datasets
Inner Join Left Join Right Join An example of the Inner Join An example of the Left Join An example of the Right Join Summary of Joins in terms of their length
Summary
4. Statistical Concepts for Predictive Modelling
Random sampling and the central limit theorem Hypothesis testing
Null versus alternate hypothesis Z-statistic and t-statistic Confidence intervals, significance levels, and p-values Different kinds of hypothesis test A step-by-step guide to do a hypothesis test An example of a hypothesis test
Chi-square tests Correlation Summary
5. Linear Regression with Python
Understanding the maths behind linear regression
Linear regression using simulated data
Fitting a linear regression model and checking its efficacy Finding the optimum value of variable coefficients
Making sense of result parameters
p-values F-statistics Residual Standard Error
Implementing linear regression with Python
Linear regression using the statsmodel library Multiple linear regression Multi-collinearity
Variance Inflation Factor
Model validation
Training and testing data split Summary of models Linear regression with scikit-learn Feature selection with scikit-learn
Handling other issues in linear regression
Handling categorical variables Transforming a variable to fit non-linear relations Handling outliers Other considerations and assumptions for linear regression
Summary
6. Logistic Regression with Python
Linear regression versus logistic regression Understanding the math behind logistic regression
Contingency tables Conditional probability Odds ratio Moving on to logistic regression from linear regression Estimation using the Maximum Likelihood Method
Likelihood function: Log likelihood function: Building the logistic regression model from scratch
Making sense of logistic regression parameters
Wald test Likelihood Ratio Test statistic Chi-square test
Implementing logistic regression with Python
Processing the data Data exploration Data visualization Creating dummy variables for categorical variables Feature selection Implementing the model
Model validation and evaluation
Cross validation
Model validation
The ROC curve
Confusion matrix
Summary
7. Clustering with Python
Introduction to clustering – what, why, and how?
What is clustering? How is clustering used? Why do we do clustering?
Mathematics behind clustering
Distances between two observations
Euclidean distance Manhattan distance Minkowski distance The distance matrix
Normalizing the distances Linkage methods
Single linkage Compete linkage Average linkage Centroid linkage Ward's method
Hierarchical clustering K-means clustering
Implementing clustering using Python
Importing and exploring the dataset Normalizing the values in the dataset Hierarchical clustering using scikit-learn K-Means clustering using scikit-learn
Interpreting the cluster
Fine-tuning the clustering
The elbow method Silhouette Coefficient
Summary
8. Trees and Random Forests with Python
Introducing decision trees
A decision tree
Understanding the mathematics behind decision trees
Homogeneity Entropy Information gain ID3 algorithm to create a decision tree Gini index Reduction in Variance Pruning a tree Handling a continuous numerical variable Handling a missing value of an attribute
Implementing a decision tree with scikit-learn
Visualizing the tree Cross-validating and pruning the decision tree
Understanding and implementing regression trees
Regression tree algorithm Implementing a regression tree using Python
Understanding and implementing random forests
The random forest algorithm Implementing a random forest using Python Why do random forests work? Important parameters for random forests
Summary
9. Best Practices for Predictive Modelling
Best practices for coding
Commenting the codes Defining functions for substantial individual tasks
Example 1 Example 2 Example 3
Avoid hard-coding of variables as much as possible Version control Using standard libraries, methods, and formulas
Best practices for data handling Best practices for algorithms Best practices for statistics Best practices for business contexts Summary
A. A List of Links
3. Module 3
1. A Conceptual Framework for Data Visualization
Data, information, knowledge, and insight
Data Information Knowledge Data analysis and insight
The transformation of data
Transforming data into information
Data collection Data preprocessing Data processing Organizing data Getting datasets
Transforming information into knowledge Transforming knowledge into insight
Data visualization history
Visualization before computers
Minard's Russian campaign (1812) The Cholera epidemics in London (1831-1855) Statistical graphics (1850-1915) Later developments in data visualization
How does visualization help decision-making?
Where does visualization fit in? Data visualization today
What is a good visualization?
Visualization plots
Bar graphs and pie charts
Bar graphs Pie charts
Box plots Scatter plots and bubble charts
Scatter plots Bubble charts
KDE plots
Summary
2. Data Analysis and Visualization
Why does visualization require planning? The Ebola example A sports example
Visually representing the results
Creating interesting stories with data
Why are stories so important? Reader-driven narratives
Gapminder The State of the Union address Mortality rate in the USA A few other example narratives
Author-driven narratives
Perception and presentation methods
The Gestalt principles of perception
Some best practices for visualization
Comparison and ranking Correlation Distribution Location-specific or geodata Part-to-whole relationships Trends over time
Visualization tools in Python
Development tools
Canopy from Enthought Anaconda from Continuum Analytics
Interactive visualization
Event listeners Layouts
Circular layout Radial layout Balloon layout
Summary
3. Getting Started with the Python IDE
The IDE tools in Python
Python 3.x versus Python 2.7 Types of interactive tools
IPython Plotly
Types of Python IDE
PyCharm PyDev Interactive Editor for Python (IEP) Canopy from Enthought Anaconda from Continuum Analytics
An overview of Spyder An overview of conda
Visualization plots with Anaconda
The surface-3D plot The square map plot
Interactive visualization packages
Bokeh VisPy
Summary
4. Numerical Computing and Interactive Plotting
NumPy, SciPy, and MKL functions
NumPy
NumPy universal functions Shape and reshape manipulation An example of interpolation Vectorizing functions Summary of NumPy linear algebra
SciPy
An example of linear equations The vectorized numerical derivative
MKL functions The performance of Python
Scalar selection Slicing
Slice using flat
Array indexing
Numerical indexing Logical indexing
Other data structures
Stacks Tuples Sets Queues Dictionaries Dictionaries for matrix representation
Sparse matrices
Visualizing sparseness
Dictionaries for memoization
Tries
Visualization using matplotlib
Word clouds Installing word clouds Input for word clouds
Web feeds The Twitter text
Plotting the stock price chart
Obtaining data
The visualization example in sports Summary
5. Financial and Statistical Models
The deterministic model
Gross returns
The stochastic model
Monte Carlo simulation
What exactly is Monte Carlo simulation? An inventory problem in Monte Carlo simulation Monte Carlo simulation in basketball The volatility plot Implied volatilities
The portfolio valuation The simulation model Geometric Brownian simulation The diffusion-based simulation
The threshold model
Schelling's Segregation Model
An overview of statistical and machine learning
K-nearest neighbors Generalized linear models
Bayesian linear regression
Creating animated and interactive plots Summary
6. Statistical and Machine Learning
Classification methods Understanding linear regression Linear regression Decision tree
An example
The Bayes theorem The Naïve Bayes classifier The Naïve Bayes classifier using TextBlob
Installing TextBlob Downloading corpora The Naïve Bayes classifier using TextBlob
Viewing positive sentiments using word clouds k-nearest neighbors Logistic regression Support vector machines Principal component analysis
Installing scikit-learn
k-means clustering Summary
7. Bioinformatics, Genetics, and Network Models
Directed graphs and multigraphs
Storing graph data Displaying graphs
igraph NetworkX Graph-tool
PageRank
The clustering coefficient of graphs Analysis of social networks The planar graph test The directed acyclic graph test Maximum flow and minimum cut A genetic programming example Stochastic block models Summary
8. Advanced Visualization
Computer simulation
Python's random package SciPy's random functions Simulation examples Signal processing Animation Visualization methods using HTML5 How is Julia different from Python? D3.js for visualization Dashboards
Summary
B. Go Forth and Explore Visualization
An overview of conda Packages installed with Anaconda Packages websites About matplotlib
Bibliography Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion