Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Preface
What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support
Downloading the example code Errata Piracy Questions
pandas and Data Analysis
Introducing pandas Data manipulation, analysis, science, and pandas
Data manipulation Data analysis Data science Where does pandas fit?
The process of data analysis
The process
Ideation Retrieval Preparation Exploration Modeling Presentation Reproduction A note on being iterative and agile
Relating the book to the process Concepts of data and analysis in our tour of pandas
Types of data
Structured Unstructured Semi-structured
Variables Categorical
Continuous Discrete
Time series data General concepts of analysis and statistics
Quantitative versus qualitative data/analysis Single and multivariate analysis Descriptive statistics Inferential statistics Stochastic models Probability and Bayesian statistics Correlation Regression
Other Python libraries of value with pandas
Numeric and scientific computing - NumPy and SciPy Statistical analysis – StatsModels Machine learning – scikit-learn PyMC - stochastic Bayesian modeling Data visualization - matplotlib and seaborn
Matplotlib Seaborn
Summary
Up and Running with pandas
Installation of Anaconda IPython and Jupyter Notebook
IPython Jupyter Notebook
Introducing the pandas Series and DataFrame
Importing pandas The pandas Series The pandas DataFrame Loading data from files into a DataFrame
Visualization Summary
Representing Univariate Data with the Series
Configuring pandas Creating a Series
Creating a Series using Python lists and dictionaries Creation using NumPy functions Creation using a scalar value
The .index and .values properties The size and shape of a Series Specifying an index at creation Heads, tails, and takes Retrieving values in a Series by label or position
Lookup by label using the [] operator and the .ix[] property Explicit lookup by position with .iloc[] Explicit lookup by labels with .loc[]
Slicing a Series into subsets Alignment via index labels Performing Boolean selection Re-indexing a Series Modifying a Series in-place Summary
Representing Tabular and Multivariate Data with the DataFrame
Configuring pandas Creating DataFrame objects
Creating a DataFrame using NumPy function results Creating a DataFrame using a Python dictionary and pandas Series objects Creating a DataFrame from a CSV file
Accessing data within a DataFrame
Selecting the columns of a DataFrame Selecting rows of a DataFrame Scalar lookup by label or location using .at[] and .iat[] Slicing using the [ ] operator
Selecting rows using Boolean selection Selecting across both rows and columns Summary
Manipulating DataFrame Structure
Configuring pandas Renaming columns Adding new columns with [] and .insert() Adding columns through enlargement Adding columns using concatenation Reordering columns Replacing the contents of a column Deleting columns Appending new rows Concatenating rows Adding and replacing rows via enlargement Removing rows using .drop() Removing rows using Boolean selection Removing rows using a slice Summary
Indexing Data
Configuring pandas The importance of indexes The pandas index types
The fundamental type - Index Integer index labels using Int64Index and RangeIndex Floating-point labels using Float64Index Representing discrete intervals using IntervalIndex Categorical values as an index - CategoricalIndex Indexing by date and time using DatetimeIndex Indexing periods of time using PeriodIndex
Working with Indexes
Creating and using an index with a Series or DataFrame Selecting values using an index Moving data to and from the index Reindexing a pandas object
Hierarchical indexing Summary
Categorical Data
Configuring pandas Creating Categoricals Renaming categories Appending new categories Removing categories Removing unused categories Setting categories Descriptive information of a Categorical Munging school grades Summary
Numerical and Statistical Methods
Configuring pandas Performing numerical methods on pandas objects
Performing arithmetic on a DataFrame or Series Getting the counts of values Determining unique values (and their counts) Finding minimum and maximum values Locating the n-smallest and n-largest values Calculating accumulated values
Performing statistical processes on pandas objects
Retrieving summary descriptive statistics Measuring central tendency: mean, median, and mode
Calculating the mean Finding the median Determining the mode
Calculating variance and standard deviation
Measuring variance Finding the standard deviation
Determining covariance and correlation
Calculating covariance Determining correlation
Performing discretization and quantiling of data Calculating the rank of values Calculating the percent change at each sample of a series Performing moving-window operations Executing random sampling of data
Summary
Accessing Data
Configuring pandas Working with CSV and text/tabular format data
Examining the sample CSV data set Reading a CSV file into a DataFrame Specifying the index column when reading a CSV file Data type inference and specification Specifying column names Specifying specific columns to load Saving DataFrame to a CSV file Working with general field-delimited data Handling variants of formats in field-delimited data
Reading and writing data in Excel format Reading and writing JSON files Reading HTML data from the web Reading and writing HDF5 format files Accessing CSV data on the web Reading and writing from/to SQL databases Reading data from remote data services
Reading stock data from Yahoo! and Google Finance Retrieving options data from Google Finance Reading economic data from the Federal Reserve Bank of St. Louis Accessing Kenneth French's data Reading from the World Bank
Summary
Tidying Up Your Data
Configuring pandas What is tidying your data? How to work with missing data
Determining NaN values in pandas objects Selecting out or dropping missing data Handling of NaN values in mathematical operations Filling in missing data Forward and backward filling of missing values Filling using index labels Performing interpolation of missing values
Handling duplicate data Transforming data
Mapping data into different values Replacing values Applying functions to transform data
Summary
Combining, Relating, and Reshaping Data
Configuring pandas Concatenating data in multiple objects
Understanding the default semantics of concatenation Switching axes of alignment Specifying join type Appending versus concatenation Ignoring the index labels
Merging and joining data
Merging data from multiple pandas objects Specifying the join semantics of a merge operation
Pivoting data to and from value and indexes Stacking and unstacking
Stacking using non-hierarchical indexes Unstacking using hierarchical indexes Melting data to and from long and wide format
Performance benefits of stacked data Summary
Data Aggregation
Configuring pandas The split, apply, and combine (SAC) pattern Data for the examples Splitting data
Grouping by a single column's values Accessing the results of a grouping Grouping using multiple columns Grouping using index levels
Applying aggregate functions, transforms, and filters
Applying aggregation functions to groups
Transforming groups of data
The general process of transformation Filling missing values with the mean of the group Calculating normalized z-scores with a transformation
Filtering groups from aggregation Summary
Time-Series Modelling
Setting up the IPython notebook Representation of dates, time, and intervals
The datetime, day, and time objects Representing a point in time with a Timestamp Using a Timedelta to represent a time interval
Introducing time-series data
Indexing using DatetimeIndex Creating time-series with specific frequencies
Calculating new dates using offsets
Representing data intervals with date offsets Anchored offsets
Representing durations of time using Period
Modelling an interval of time with a Period Indexing using the PeriodIndex
Handling holidays using calendars Normalizing timestamps using time zones Manipulating time-series data
Shifting and lagging Performing frequency conversion on a time-series Up and down resampling of a time-series
Time-series moving-window operations Summary
Visualization
Configuring pandas Plotting basics with pandas Creating time-series charts
Adorning and styling your time-series plot
Adding a title and changing axes labels Specifying the legend content and position Specifying line colors, styles, thickness, and markers Specifying tick mark locations and tick labels Formatting axes' tick date labels using formatters
Common plots used in statistical analyses
Showing relative differences with bar plots Picturing distributions of data with histograms Depicting distributions of categorical data with box and whisker charts Demonstrating cumulative totals with area plots Relationships between two variables with scatter plots Estimates of distribution with the kernel density plot Correlations between multiple variables with the scatter plot matrix Strengths of relationships in multiple variables with heatmaps
Manually rendering multiple plots in a single chart Summary
Historical Stock Price Analysis
Setting up the IPython notebook Obtaining and organizing stock data from Google Plotting time-series prices Plotting volume-series data Calculating the simple daily percentage change in closing price Calculating simple daily cumulative returns of a stock Resampling data from daily to monthly returns Analyzing distribution of returns Performing a moving-average calculation Comparison of average daily returns across stocks Correlation of stocks based on the daily percentage change of the closing price Calculating the volatility of stocks Determining risk relative to expected returns Summary
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion