Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Python Data Science Essentials
Table of Contents Python Data Science Essentials Credits About the Authors About the Reviewers www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe? Free access for Packt account holders
Preface
What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support
Downloading the example code Errata Piracy Questions
1. First Steps
Introducing data science and Python Installing Python
Python 2 or Python 3? Step-by-step installation A glance at the essential Python packages
NumPy SciPy pandas Scikit-learn IPython Matplotlib Statsmodels Beautiful Soup NetworkX NLTK Gensim PyPy
The installation of packages Package upgrades
Scientific distributions
Anaconda Enthought Canopy PythonXY WinPython
Introducing IPython
The IPython Notebook Datasets and code used in the book
Scikit-learn toy datasets The MLdata.org public repository LIBSVM data examples Loading data directly from CSV or text files Scikit-learn sample generators
Summary
2. Data Munging
The data science process Data loading and preprocessing with pandas
Fast and easy data loading Dealing with problematic data Dealing with big datasets Accessing other data formats Data preprocessing Data selection
Working with categorical and textual data
A special type of data – text
Data processing with NumPy
NumPy's n-dimensional array The basics of NumPy ndarray objects
Creating NumPy arrays
From lists to unidimensional arrays Controlling the memory size Heterogeneous lists From lists to multidimensional arrays Resizing arrays Arrays derived from NumPy functions Getting an array directly from a file Extracting data from pandas
NumPy fast operation and computations
Matrix operations Slicing and indexing with NumPy arrays Stacking NumPy arrays
Summary
3. The Data Science Pipeline
Introducing EDA Feature creation Dimensionality reduction
The covariance matrix Principal Component Analysis (PCA) A variation of PCA for big data – RandomizedPCA Latent Factor Analysis (LFA) Linear Discriminant Analysis (LDA) Latent Semantical Analysis (LSA) Independent Component Analysis (ICA) Kernel PCA Restricted Boltzmann Machine (RBM)
The detection and treatment of outliers
Univariate outlier detection EllipticEnvelope OneClassSVM
Scoring functions
Multilabel classification Binary classification Regression
Testing and validating Cross-validation
Using cross-validation iterators Sampling and bootstrapping
Hyper-parameters' optimization
Building custom scoring functions Reducing the grid search runtime
Feature selection
Univariate selection Recursive elimination Stability and L1-based selection
Summary
4. Machine Learning
Linear and logistic regression Naive Bayes The k-Nearest Neighbors Advanced nonlinear algorithms
SVM for classification SVM for regression Tuning SVM
Ensemble strategies
Pasting by random samples Bagging with weak ensembles Random Subspaces and Random Patches Sequences of models – AdaBoost Gradient tree boosting (GTB) Dealing with big data
Creating some big datasets as examples Scalability with volume Keeping up with velocity Dealing with variety A quick overview of Stochastic Gradient Descent (SGD)
A peek into Natural Language Processing (NLP)
Word tokenization Stemming Word Tagging Named Entity Recognition (NER) Stopwords A complete data science example – text classification
An overview of unsupervised learning Summary
5. Social Network Analysis
Introduction to graph theory Graph algorithms Graph loading, dumping, and sampling Summary
6. Visualization
Introducing the basics of matplotlib
Curve plotting Using panels Scatterplots Histograms Bar graphs Image visualization
Selected graphical examples with pandas
Boxplots and histograms Scatterplots Parallel coordinates
Advanced data learning representation
Learning curves Validation curves Feature importance GBT partial dependence plot
Summary
Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion