Spark · the Definitive Guide by Chambers, Bill -- Read -- Imperial Library of Trantor

Index

Preface I. Gentle Overview of Big Data and Spark 1. What Is Apache Spark? 2. A Gentle Introduction to Spark 3. A Tour of Spark’s Toolset II. Structured APIs—DataFrames, SQL, and Datasets 4. Structured API Overview 5. Basic Structured Operations 6. Working with Different Types of Data 7. Aggregations 8. Joins 9. Data Sources 10. Spark SQL 11. Datasets III. Low-Level APIs 12. Resilient Distributed Datasets (RDDs) 13. Advanced RDDs 14. Distributed Shared Variables IV. Production Applications 15. How Spark Runs on a Cluster 16. Developing Spark Applications 17. Deploying Spark 18. Monitoring and Debugging 19. Performance Tuning V. Streaming 20. Stream Processing Fundamentals 21. Structured Streaming Basics 22. Event-Time and Stateful Processing 23. Structured Streaming in Production VI. Advanced Analytics and Machine Learning 24. Advanced Analytics and Machine Learning Overview 25. Preprocessing and Feature Engineering 26. Classification 27. Regression 28. Recommendation 29. Unsupervised Learning 30. Graph Analytics 31. Deep Learning VII. Ecosystem 32. Language Specifics: Python (PySpark) and R (SparkR and sparklyr) 33. Ecosystem and Community Index

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab.