Scala for Data Science by Bugnion, Pascal -- Read -- Imperial Library of Trantor

Index

Scala for Data Science

Table of Contents Scala for Data Science Credits About the Author About the Reviewers www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe? Free access for Packt account holders

Preface

What this book covers What you need for this book

Installing the JDK Installing and using SBT

Who this book is for Conventions Reader feedback Customer support

Downloading the example code Errata Piracy eBooks, discount offers, and more Questions

1. Scala and Data Science

Data science Programming in data science Why Scala?

Static typing and type inference Scala encourages immutability Scala and functional programs Null pointer uncertainty Easier parallelism Interoperability with Java

When not to use Scala Summary References

2. Manipulating Data with Breeze

Code examples Installing Breeze Getting help on Breeze Basic Breeze data types

Vectors Dense and sparse vectors and the vector trait Matrices Building vectors and matrices Advanced indexing and slicing Mutating vectors and matrices Matrix multiplication, transposition, and the orientation of vectors Data preprocessing and feature engineering Breeze – function optimization Numerical derivatives Regularization

An example – logistic regression Towards re-usable code Alternatives to Breeze Summary References

3. Plotting with breeze-viz

Diving into Breeze Customizing plots Customizing the line type More advanced scatter plots Multi-plot example – scatterplot matrix plots Managing without documentation Breeze-viz reference Data visualization beyond breeze-viz Summary

4. Parallel Collections and Futures

Parallel collections

Limitations of parallel collections Error handling Setting the parallelism level An example – cross-validation with parallel collections

Futures

Future composition – using a future's result Blocking until completion Controlling parallel execution with execution contexts Futures example – stock price fetcher

Summary References

5. Scala and SQL through JDBC

Interacting with JDBC First steps with JDBC

Connecting to a database server Creating tables Inserting data Reading data

JDBC summary Functional wrappers for JDBC Safer JDBC connections with the loan pattern Enriching JDBC statements with the "pimp my library" pattern Wrapping result sets in a stream Looser coupling with type classes

Type classes Coding against type classes When to use type classes Benefits of type classes

Creating a data access layer Summary References

6. Slick – A Functional Interface for SQL

FEC data

Importing Slick Defining the schema Connecting to the database Creating tables Inserting data Querying data

Invokers Operations on columns Aggregations with "Group by" Accessing database metadata Slick versus JDBC Summary References

7. Web APIs

A whirlwind tour of JSON Querying web APIs JSON in Scala – an exercise in pattern matching

JSON4S types Extracting fields using XPath

Extraction using case classes Concurrency and exception handling with futures Authentication – adding HTTP headers

HTTP – a whirlwind overview Adding headers to HTTP requests in Scala

Summary References

8. Scala and MongoDB

MongoDB Connecting to MongoDB with Casbah

Connecting with authentication

Inserting documents Extracting objects from the database Complex queries Casbah query DSL Custom type serialization Beyond Casbah Summary References

9. Concurrency with Akka

GitHub follower graph Actors as people Hello world with Akka Case classes as messages Actor construction Anatomy of an actor Follower network crawler Fetcher actors Routing Message passing between actors Queue control and the pull pattern Accessing the sender of a message Stateful actors Follower network crawler Fault tolerance Custom supervisor strategies Life-cycle hooks What we have not talked about Summary References

10. Distributed Batch Processing with Spark

Installing Spark Acquiring the example data Resilient distributed datasets

RDDs are immutable RDDs are lazy RDDs know their lineage RDDs are resilient RDDs are distributed Transformations and actions on RDDs Persisting RDDs Key-value RDDs Double RDDs

Building and running standalone programs

Running Spark applications locally Reducing logging output and Spark configuration Running Spark applications on EC2

Spam filtering Lifting the hood Data shuffling and partitions Summary Reference

11. Spark SQL and DataFrames

DataFrames – a whirlwind introduction Aggregation operations Joining DataFrames together Custom functions on DataFrames DataFrame immutability and persistence SQL statements on DataFrames Complex data types – arrays, maps, and structs

Structs Arrays Maps

Interacting with data sources

JSON files Parquet files

Standalone programs Summary References

12. Distributed Machine Learning with MLlib

Introducing MLlib – Spam classification Pipeline components

Transformers Estimators

Evaluation Regularization in logistic regression Cross-validation and model selection Beyond logistic regression Summary References

13. Web APIs with Play

Client-server applications Introduction to web frameworks Model-View-Controller architecture Single page applications Building an application The Play framework Dynamic routing Actions

Composing the response Understanding and parsing the request

Interacting with JSON Querying external APIs and consuming JSON

Calling external web services Parsing JSON Asynchronous actions

Creating APIs with Play: a summary Rest APIs: best practice Summary References

14. Visualization with D3 and the Play Framework

GitHub user data Do I need a backend? JavaScript dependencies through web-jars Towards a web application: HTML templates Modular JavaScript through RequireJS Bootstrapping the applications Client-side program architecture

Designing the model The event bus AJAX calls through JQuery Response views

Drawing plots with NVD3 Summary References

A. Pattern Matching and Extractors

Pattern matching in for comprehensions Pattern matching internals Extracting sequences Summary Reference

Index

← Prev
Back
Next →

← Prev
Back
Next →