Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Machine Learning for Hackers
Preface
Machine Learning for Hackers
How This Book Is Organized
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
Acknowledgements
1. Using R
R for Machine Learning
Downloading and Installing R
Windows
Mac OS X
Linux
IDEs and Text Editors
Loading and Installing R Packages
R Basics for Machine Learning
Loading libraries and the data
Converting date strings and dealing with malformed data
Organizing location data
Dealing with data outside our scope
Aggregating and organizing the data
Analyzing the data
Further Reading on R
2. Data Exploration
Exploration versus Confirmation
What Is Data?
Inferring the Types of Columns in Your Data
Inferring Meaning
Numeric Summaries
Means, Medians, and Modes
Quantiles
Standard Deviations and Variances
Exploratory Data Visualization
Visualizing the Relationships Between Columns
3. Classification: Spam Filtering
This or That: Binary Classification
Moving Gently into Conditional Probability
Writing Our First Bayesian Spam Classifier
Defining the Classifier and Testing It with Hard Ham
Testing the Classifier Against All Email Types
Improving the Results
4. Ranking: Priority Inbox
How Do You Sort Something When You Don’t Know the Order?
Ordering Email Messages by Priority
Priority Features of Email
Writing a Priority Inbox
Functions for Extracting the Feature Set
Creating a Weighting Scheme for Ranking
A log-weighting scheme
Weighting from Email Thread Activity
Training and Testing the Ranker
5. Regression: Predicting Page Views
Introducing Regression
The Baseline Model
Regression Using Dummy Variables
Linear Regression in a Nutshell
Predicting Web Traffic
Defining Correlation
6. Regularization: Text Regression
Nonlinear Relationships Between Columns: Beyond Straight Lines
Introducing Polynomial Regression
Methods for Preventing Overfitting
Preventing Overfitting with Regularization
Text Regression
Logistic Regression to the Rescue
7. Optimization: Breaking Codes
Introduction to Optimization
Ridge Regression
Code Breaking as Optimization
8. PCA: Building a Market Index
Unsupervised Learning
9. MDS: Visually Exploring US Senator Similarity
Clustering Based on Similarity
A Brief Introduction to Distance Metrics and Multidirectional Scaling
How Do US Senators Cluster?
Analyzing US Senator Roll Call Data (101st–111th Congresses)
Exploring senator MDS clustering by Congress
10. kNN: Recommendation Systems
The k-Nearest Neighbors Algorithm
R Package Installation Data
11. Analyzing Social Graphs
Social Network Analysis
Thinking Graphically
Hacking Twitter Social Graph Data
Working with the Google SocialGraph API
Analyzing Twitter Networks
Local Community Structure
Visualizing the Clustered Twitter Network with Gephi
Building Your Own “Who to Follow” Engine
12. Model Comparison
SVMs: The Support Vector Machine
Comparing Algorithms
Works Cited
Books
Articles
Index
About the Authors
Colophon
Copyright
← Prev
Back
Next →
← Prev
Back
Next →