Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Agile Data Science
Preface
Who This Book Is For
How This Book Is Organized
Conventions Used in This Book
Using Code Examples
Safari® Books Online
How to Contact Us
I. Setup
1. Theory
Agile Big Data
Big Words Defined
Agile Big Data Teams
Recognizing the Opportunity and Problem
Adapting to Change
Harnessing the power of generalists
Leveraging agile platforms
Sharing intermediate results
Agile Big Data Process
Code Review and Pair Programming
Agile Environments: Engineering Productivity
Collaboration Space
Private Space
Personal Space
Realizing Ideas with Large-Format Printing
2. Data
Email
Working with Raw Data
Raw Email
Structured Versus Semistructured Data
SQL
NoSQL
Serialization
Extracting and Exposing Features in Evolving Schemas
Data Pipelines
Data Perspectives
Networks
Time Series
Natural Language
Probability
Conclusion
3. Agile Tools
Scalability = Simplicity
Agile Big Data Processing
Setting Up a Virtual Environment for Python
Serializing Events with Avro
Avro for Python
Installation
Testing
Collecting Data
Data Processing with Pig
Installing Pig
Publishing Data with MongoDB
Installing MongoDB
Installing MongoDB’s Java Driver
Installing mongo-hadoop
Pushing Data to MongoDB from Pig
Searching Data with ElasticSearch
Installation
ElasticSearch and Pig with Wonderdog
Installing Wonderdog
Wonderdog and Pig
Searching our data
Python and ElasticSearch with pyelasticsearch
Reflecting on our Workflow
Lightweight Web Applications
Python and Flask
Flask Echo ch03/python/flask_echo.py
Python and Mongo with pymongo
Displaying sent_counts in Flask
Presenting Our Data
Installing Bootstrap
Booting Boostrap
Visualizing Data with D3.js and nvd3.js
Conclusion
4. To the Cloud!
Introduction
GitHub
dotCloud
Echo on dotCloud
Python Workers
Amazon Web Services
Simple Storage Service
Elastic MapReduce
MongoDB as a Service
Pushing data from Pig to MongoDB at dotCloud
Instrumentation
Google Analytics
Mortar Data
II. Climbing the Pyramid
5. Collecting and Displaying Records
Putting It All Together
Collect and Serialize Our Inbox
Process and Publish Our Emails
Presenting Emails in a Browser
Serving Emails with Flask and pymongo
Rendering HTML5 with Jinja2
Agile Checkpoint
Listing Emails
Listing Emails with MongoDB
Anatomy of a Presentation
Reinventing the wheel?
Prototyping back from HTML
Searching Our Email
Indexing Our Email with Pig, ElasticSearch, and Wonderdog
Searching Our Email on the Web
Conclusion
6. Visualizing Data with Charts
Good Charts
Extracting Entities: Email Addresses
Extracting Emails
Visualizing Time
Conclusion
7. Exploring Data with Reports
Building Reports with Multiple Charts
Linking Records
Extracting Keywords from Emails with TF-IDF
Conclusion
8. Making Predictions
Predicting Response Rates to Emails
Personalization
Conclusion
9. Driving Actions
Properties of Successful Emails
Better Predictions with Naive Bayes
P(Reply | From & To)
P(Reply | Token)
Making Predictions in Real Time
Logging Events
Conclusion
Index
About the Author
Colophon
Copyright
← Prev
Back
Next →
← Prev
Back
Next →