Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Mastering pandas Credits About the Author About the Reviewers www.PacktPub.com Support files, eBooks, discount offers, and more Why subscribe? Free access for Packt account holders Preface What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support Downloading the example code Errata Piracy Questions 1. Introduction to pandas and Data Analysis Motivation for data analysis We live in a big data world 4 V's of big data Volume of big data Velocity of big data Variety of big data Veracity of big data So much data, so little time for analysis The move towards real-time analytics How Python and pandas fit into the data analytics mix What is pandas? Benefits of using pandas Summary 2. Installation of pandas and the Supporting Software Selecting a version of Python to use Python installation Linux Installing Python from compressed tarball Windows Core Python installation Third-party Python software installation Mac OS X Installation using a package manager Installation of Python and pandas from a third-party vendor Continuum Analytics Anaconda Installing Anaconda Linux Mac OS X Windows Final step for all platforms Other numeric or analytics-focused Python distributions Downloading and installing pandas Linux Ubuntu/Debian Red Hat Ubuntu/Debian Fedora OpenSuse Mac Source installation Binary installation Windows Binary Installation Source installation IPython IPython Notebook IPython installation Linux Windows Mac OS X Install via Anaconda (for Linux/Mac OS X) Wakari by Continuum Analytics Virtualenv Virtualenv installation and usage Summary 3. The pandas Data Structures NumPy ndarrays NumPy array creation NumPy arrays via numpy.array NumPy array via numpy.arange NumPy array via numpy.linspace NumPy array via various other functions numpy.ones numpy.zeros numpy.eye numpy.diag numpy.random.rand numpy.empty numpy.tile NumPy datatypes NumPy indexing and slicing Array slicing Array masking Complex indexing Copies and views Operations Basic operations Reduction operations Statistical operators Logical operators Broadcasting Array shape manipulation Flattening a multi-dimensional array Reshaping Resizing Adding a dimension Array sorting Data structures in pandas Series Series creation Using numpy.ndarray Using Python dictionary Using scalar values Operations on Series Assignment Slicing Other operations DataFrame DataFrame Creation Using dictionaries of Series Using a dictionary of ndarrays/lists Using a structured array Using a Series structure Operations Selection Assignment Deletion Alignment Other mathematical operations Panel Using 3D NumPy array with axis labels Using a Python dictionary of DataFrame objects Using the DataFrame.to_panel method Other operations Summary 4. Operations in pandas, Part I – Indexing and Selecting Basic indexing Accessing attributes using dot operator Range slicing Label, integer, and mixed indexing Label-oriented indexing Selection using a Boolean array Integer-oriented indexing The .iat and .at operators Mixed indexing with the .ix operator MultiIndexing Swapping and reordering levels Cross sections Boolean indexing The is in and any all methods Using the where() method Operations on indexes Summary 5. Operations in pandas, Part II – Grouping, Merging, and Reshaping of Data Grouping of data The groupby operation Using groupby with a MultiIndex Using the aggregate method Applying multiple functions The transform() method Filtering Merging and joining The concat function Using append Appending a single row to a DataFrame SQL-like merging/joining of DataFrame objects The join function Pivots and reshaping data Stacking and unstacking The stack() function Other methods to reshape DataFrames Using the melt function The pandas.get_dummies() function Summary 6. Missing Data, Time Series, and Plotting Using Matplotlib Handling missing data Handling missing values Handling time series Reading in time series data DateOffset and TimeDelta objects Time series-related instance methods Shifting/lagging Frequency conversion Resampling of data Aliases for Time Series frequencies Time series concepts and datatypes Period and PeriodIndex PeriodIndex Conversions between Time Series datatypes A summary of Time Series-related objects Plotting using matplotlib Summary 7. A Tour of Statistics – The Classical Approach Descriptive statistics versus inferential statistics Measures of central tendency and variability Measures of central tendency The mean The median The mode Computing measures of central tendency of a dataset in Python Measures of variability, dispersion, or spread Range Quartile Deviation and variance Hypothesis testing – the null and alternative hypotheses The null and alternative hypotheses The alpha and p-values Type I and Type II errors Statistical hypothesis tests Background The z-test The t-test Types of t-tests A t-test example Confidence intervals An illustrative example Correlation and linear regression Correlation Linear regression An illustrative example Summary 8. A Brief Tour of Bayesian Statistics Introduction to Bayesian statistics Mathematical framework for Bayesian statistics Bayes theory and odds Applications of Bayesian statistics Probability distributions Fitting a distribution Discrete probability distributions Discrete uniform distributions The Bernoulli distribution The binomial distribution The Poisson distribution The Geometric distribution The negative binomial distribution Continuous probability distributions The continuous uniform distribution The exponential distribution The normal distribution Bayesian statistics versus Frequentist statistics What is probability? How the model is defined Confidence (Frequentist) versus Credible (Bayesian) intervals Conducting Bayesian statistical analysis Monte Carlo estimation of the likelihood function and PyMC Bayesian analysis example – Switchpoint detection References Summary 9. The pandas Library Architecture Introduction to pandas' file hierarchy Description of pandas' modules and files pandas/core pandas/io pandas/tools pandas/sparse pandas/stats pandas/util pandas/rpy pandas/tests pandas/compat pandas/computation pandas/tseries pandas/sandbox Improving performance using Python extensions Summary 10. R and pandas Compared R data types R lists R DataFrames Slicing and selection R-matrix and NumPy array compared R lists and pandas series compared Specifying column name in R Specifying column name in pandas R's DataFrames versus pandas' DataFrames Multicolumn selection in R Multicolumn selection in pandas Arithmetic operations on columns Aggregation and GroupBy Aggregation in R The pandas' GroupBy operator Comparing matching operators in R and pandas R %in% operator The pandas isin() function Logical subsetting Logical subsetting in R Logical subsetting in pandas Split-apply-combine Implementation in R Implementation in pandas Reshaping using melt The R melt() function The pandas melt() function Factors/categorical data An R example using cut() The pandas solution Summary 11. Brief Tour of Machine Learning Role of pandas in machine learning Installation of scikit-learn Installing via Anaconda Installing on Unix (Linux/Mac OS X) Installing on Windows Introduction to machine learning Supervised versus unsupervised learning Illustration using document classification Supervised learning Unsupervised learning How machine learning systems learn Application of machine learning – Kaggle Titanic competition The titanic: machine learning from disaster problem The problem of overfitting Data analysis and preprocessing using pandas Examining the data Handling missing values A naïve approach to Titanic problem The scikit-learn ML/classifier interface Supervised learning algorithms Constructing a model using Patsy for scikit-learn General boilerplate code explanation Logistic regression Support vector machine Decision trees Random forest Unsupervised learning algorithms Dimensionality reduction K-means clustering Summary Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion