Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
About This eBook Title Page Copyright Page Dedication Page Contents Foreword Preface
Who This Book Is For Why Now? The Internet of . . . Everything A Journey toward Ubiquitous Computing How This Book Is Organized
Acknowledgments About the Author I: Directives in the Big Data Era
1. Four Rules for Data Success
When Data Became a BIG Deal Data and the Single Server The Big Data Trade-Off Anatomy of a Big Data Pipeline The Ultimate Database Summary
II: Collecting and Sharing a Lot of Data
2. Hosting and Sharing Terabytes of Raw Data
Suffering from Files Storage: Infrastructure as a Service Choosing the Right Data Format Character Encoding Data in Motion: Data Serialization Formats Summary
3. Building a NoSQL-Based Web App to Collect Crowd-Sourced Data
Relational Databases: Command and Control Relational Databases versus the Internet Nonrelational Database Models Leaning toward Write Performance: Redis Sharding across Many Redis Instances NewSQL: The Return of Codd Summary
4. Strategies for Dealing with Data Silos
A Warehouse Full of Jargon Hadoop: The Elephant in the Warehouse Data Silos Can Be Good Convergence: The End of the Data Silo Summary
III: Asking Questions about Your Data
5. Using Hadoop, Hive, and Shark to Ask Questions about Large Datasets
What Is a Data Warehouse? Apache Hive: Interactive Querying for Hadoop Shark: Queries at the Speed of RAM Data Warehousing in the Cloud Summary
6. Building a Data Dashboard with Google BigQuery
Analytical Databases Dremel: Spreading the Wealth BigQuery: Data Analytics as a Service Building a Custom Big Data Dashboard The Future of Analytical Query Engines Summary
7. Visualization Strategies for Exploring Large Datasets
Cautionary Tales: Translating Data into Narrative Human Scale versus Machine Scale Building Applications for Data Interactivity Summary
IV: Building Data Pipelines
8. Putting It Together: MapReduce Data Pipelines
What Is a Data Pipeline? Data Pipelines with Hadoop Streaming A One-Step MapReduce Transformation Managing Complexity: Python MapReduce Frameworks for Hadoop Summary
9. Building Data Transformation Workflows with Pig and Cascading
Large-Scale Data Workflows in Practice It’s Complicated: Multistep MapReduce Transformations Cascading: Building Robust Data-Workflow Applications When to Choose Pig versus Cascading Summary
V: Machine Learning for Large Datasets
10. Building a Data Classification System with Mahout
Can Machines Predict the Future? Challenges of Machine Learning Apache Mahout: Scalable Machine Learning MLBase: Distributed Machine Learning Framework Summary
VI: Statistical Analysis for Massive Datasets
11. Using R with Large Datasets
Why Statistics Are Sexy Strategies for Dealing with Large Datasets Summary
12. Building Analytics Workflows Using Python and Pandas
The Snakes Are Loose in the Data Zoo Python Libraries for Data Processing Building More Complex Workflows iPython: Completing the Scientific Computing Tool Chain Summary
VII: Looking Ahead
13. When to Build, When to Buy, When to Outsource
Overlapping Solutions Understanding Your Data Problem A Playbook for the Build versus Buy Problem My Own Private Data Center Understand the Costs of Open-Source Everything as a Service Summary
14. The Future: Trends in Data Technology
Hadoop: The Disruptor and the Disrupted Everything in the Cloud The Rise and Fall of the Data Scientist Convergence: The Ultimate Database Convergence of Cultures Summary
Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion