Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Scala for Data Science
Table of Contents Scala for Data Science Credits About the Author About the Reviewers www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe? Free access for Packt account holders
Preface
What this book covers What you need for this book
Installing the JDK Installing and using SBT
Who this book is for Conventions Reader feedback Customer support
Downloading the example code Errata Piracy eBooks, discount offers, and more Questions
1. Scala and Data Science
Data science Programming in data science Why Scala?
Static typing and type inference Scala encourages immutability Scala and functional programs Null pointer uncertainty Easier parallelism Interoperability with Java
When not to use Scala Summary References
2. Manipulating Data with Breeze
Code examples Installing Breeze Getting help on Breeze Basic Breeze data types
Vectors Dense and sparse vectors and the vector trait Matrices Building vectors and matrices Advanced indexing and slicing Mutating vectors and matrices Matrix multiplication, transposition, and the orientation of vectors Data preprocessing and feature engineering Breeze – function optimization Numerical derivatives Regularization
An example – logistic regression Towards re-usable code Alternatives to Breeze Summary References
3. Plotting with breeze-viz
Diving into Breeze Customizing plots Customizing the line type More advanced scatter plots Multi-plot example – scatterplot matrix plots Managing without documentation Breeze-viz reference Data visualization beyond breeze-viz Summary
4. Parallel Collections and Futures
Parallel collections
Limitations of parallel collections Error handling Setting the parallelism level An example – cross-validation with parallel collections
Futures
Future composition – using a future's result Blocking until completion Controlling parallel execution with execution contexts Futures example – stock price fetcher
Summary References
5. Scala and SQL through JDBC
Interacting with JDBC First steps with JDBC
Connecting to a database server Creating tables Inserting data Reading data
JDBC summary Functional wrappers for JDBC Safer JDBC connections with the loan pattern Enriching JDBC statements with the "pimp my library" pattern Wrapping result sets in a stream Looser coupling with type classes
Type classes Coding against type classes When to use type classes Benefits of type classes
Creating a data access layer Summary References
6. Slick – A Functional Interface for SQL
FEC data
Importing Slick Defining the schema Connecting to the database Creating tables Inserting data Querying data
Invokers Operations on columns Aggregations with "Group by" Accessing database metadata Slick versus JDBC Summary References
7. Web APIs
A whirlwind tour of JSON Querying web APIs JSON in Scala – an exercise in pattern matching
JSON4S types Extracting fields using XPath
Extraction using case classes Concurrency and exception handling with futures Authentication – adding HTTP headers
HTTP – a whirlwind overview Adding headers to HTTP requests in Scala
Summary References
8. Scala and MongoDB
MongoDB Connecting to MongoDB with Casbah
Connecting with authentication
Inserting documents Extracting objects from the database Complex queries Casbah query DSL Custom type serialization Beyond Casbah Summary References
9. Concurrency with Akka
GitHub follower graph Actors as people Hello world with Akka Case classes as messages Actor construction Anatomy of an actor Follower network crawler Fetcher actors Routing Message passing between actors Queue control and the pull pattern Accessing the sender of a message Stateful actors Follower network crawler Fault tolerance Custom supervisor strategies Life-cycle hooks What we have not talked about Summary References
10. Distributed Batch Processing with Spark
Installing Spark Acquiring the example data Resilient distributed datasets
RDDs are immutable RDDs are lazy RDDs know their lineage RDDs are resilient RDDs are distributed Transformations and actions on RDDs Persisting RDDs Key-value RDDs Double RDDs
Building and running standalone programs
Running Spark applications locally Reducing logging output and Spark configuration Running Spark applications on EC2
Spam filtering Lifting the hood Data shuffling and partitions Summary Reference
11. Spark SQL and DataFrames
DataFrames – a whirlwind introduction Aggregation operations Joining DataFrames together Custom functions on DataFrames DataFrame immutability and persistence SQL statements on DataFrames Complex data types – arrays, maps, and structs
Structs Arrays Maps
Interacting with data sources
JSON files Parquet files
Standalone programs Summary References
12. Distributed Machine Learning with MLlib
Introducing MLlib – Spam classification Pipeline components
Transformers Estimators
Evaluation Regularization in logistic regression Cross-validation and model selection Beyond logistic regression Summary References
13. Web APIs with Play
Client-server applications Introduction to web frameworks Model-View-Controller architecture Single page applications Building an application The Play framework Dynamic routing Actions
Composing the response Understanding and parsing the request
Interacting with JSON Querying external APIs and consuming JSON
Calling external web services Parsing JSON Asynchronous actions
Creating APIs with Play: a summary Rest APIs: best practice Summary References
14. Visualization with D3 and the Play Framework
GitHub user data Do I need a backend? JavaScript dependencies through web-jars Towards a web application: HTML templates Modular JavaScript through RequireJS Bootstrapping the applications Client-side program architecture
Designing the model The event bus AJAX calls through JQuery Response views
Drawing plots with NVD3 Summary References
A. Pattern Matching and Extractors
Pattern matching in for comprehensions Pattern matching internals Extracting sequences Summary Reference
Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion