Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Intro to Python® for Computer Science and Data Science
Deitel® Series Page
Intro to Python® for Computer Science and Data Science
Intro to Python® for Computer Science and Data Science
Contents
Preface
Python for Computer Science and Data Science Education
Modular Architecture
Audiences for the Book
Key Features
Chapter Dependencies
Computing and Data Science Curricula
Data Science Overlaps with Computer Science28
Jobs Requiring Data Science Skills
Jupyter Notebooks
Docker
Class Tested
“Flipped Classroom”
Special Feature: IBM Watson Analytics and Cognitive Computing
Teaching Approach
Software Used in the Book
Python Documentation
Getting Your Questions Answered
Student and Instructor Supplements
Instructor Supplements on Pearson’s Instructor Resource Center
Instructor Examination Copies
Keeping in Touch with the Authors
Acknowledgments
About the Authors
About Deitel® & Associates, Inc.
Before You Begin
1 Introduction to Computers and Python
Objectives
Outline
1.1 Introduction
1.2 Hardware and Software
1.2.1 Moore’s LawMoore’s Law
1.2.2 Computer Organization
Input Unit
Output Unit
Memory Unit
Arithmetic and Logic Unit (ALU)
Central Processing Unit (CPU)
Secondary Storage Unit
Self Check for Section 1.2
1.3 Data Hierarchy
Self Check
1.4 Machine Languages, Assembly Languages and High-Level Languages
Self Check
1.5 Introduction to Object Technology
Self Check for Section 1.5
1.6 Operating Systems
Self Check for Section 1.6
1.7 Python
Self Check
1.8 It’s the Libraries!It’s the Libraries!
1.8.1 Python Standard Library
1.8.2 Data-Science Libraries
Self Check for Section 1.8
1.9 Other Popular Programming Languages
Self Check
1.10 Test-Drive: Using IPython and Jupyter Notebooks
1.10.1 Using IPython Interactive Mode as a Calculator
Entering IPython in Interactive Mode
Evaluating Expressions
Exiting Interactive Mode
Self Check
1.10.2 Executing a Python Program Using the IPython Interpreter
Changing to This Chapter’s Examples Folder
Executing the Script
Creating Scripts
Problems That May Occur at Execution Time
Self Check
1.10.3 Writing and Executing Code in a Jupyter Notebook
Opening JupyterLab in Your Browser
Creating a New Jupyter Notebook
Renaming the Notebook
Evaluating an Expression
Adding and Executing Another Cell
Saving the Notebook
Notebooks Provided with Each Chapter’s Examples
Opening and Executing an Existing Notebook
Closing JupyterLab
JupyterLab Tips
More Information on Working with JupyterLab
Self Check
1.11 Internet and World Wide Web
1.11.1 Internet: A Network of Networks
1.11.2 World Wide Web: Making the Internet User-Friendly
1.11.3 The Cloud
Mashups
1.11.4 Internet of Things
Self Check for Section 1.11
1.12 Software Technologies
Self Check
1.13 How Big Is Big Data?
Self Check
1.13.1 Big Data Analytics
1.13.2 Data Science and Big Data Are Making a Difference: Use Cases
1.14 Case Study—A Big-Data Mobile ApplicationCase Study—A Big-Data Mobile Application
1.15 Intro to Data Science: Artificial Intelligence—at the Intersection of CS and Data ScienceIntro to Data Science: Artificial Intelligence—at the Intersection of CS and Data Science
Self Check
Exercises
2 Introduction to Python Programming
Objectives
Outline
2.1 Introduction
2.2 Variables and Assignment Statements
Self Check
2.3 Arithmetic
Self Check
2.4 Function print and an Intro to Single- and Double-Quoted Strings
Self Check
2.5 Triple-Quoted Strings
Self Check
2.6 Getting Input from the User
Self Check
2.7 Decision Making: The if Statement and Comparison Operators
Self Check
2.8 Objects and Dynamic Typing
Self Check
2.9 Intro to Data Science: Basic Descriptive Statistics
Self Check
2.10 Wrap-Up
Exercises
3 Control Statements and Program Development
Objectives
Outline
3.1 Introduction
3.2 Algorithms
Self Check
3.3 Pseudocode
Self Check
3.4 Control Statements
Self Check
3.5 if Statement
Self Check
3.6 if……else and if……elif……else Statements
Self Check
3.7 while Statement
Self Check
3.8 for Statement
3.8.1 Iterables, Lists and Iterators
3.8.2 Built-In range Function
Off-By-One Errors
Self Check
3.9 Augmented Assignments
Self Check
3.10 Program Development: Sequence-Controlled Repetition
3.10.1 Requirements Statement
3.10.2 Pseudocode for the Algorithm
3.10.3 Coding the Algorithm in Python
Execution Phases
Initialization Phase
Processing Phase
Termination Phase
3.10.4 Introduction to Formatted Strings
Self Check
3.11 Program Development: Sentinel-Controlled Repetition
Self Check
3.12 Program Development: Nested Control Statements
Self Check
3.13 Built-In Function range: A Deeper Look
Self Check
3.14 Using Type Decimal for Monetary Amounts
Self Check
3.15 break and continue Statements
3.16 Boolean Operators and, or and not
Self Check
3.17 Intro to Data Science: Measures of Central Tendency—Mean, Median and ModeIntro to Data Science: Measures of Central Tendency—Mean, Median and Mode
Self Check
3.18 Wrap-Up
Exercises
4 Functions
Objectives
Outline
4.1 Introduction
4.2 Defining Functions
Self Check
4.3 Functions with Multiple Parameters
Self Check
4.4 Random-Number Generation
Self Check
4.5 Case Study: A Game of Chance
Self Check
4.6 Python Standard Library
Self Check
4.7 math Module Functions
4.8 Using IPython Tab Completion for Discovery
Self Check
4.9 Default Parameter Values
Self Check
4.10 Keyword Arguments
Self Check
4.11 Arbitrary Argument Lists
Self Check
4.12 Methods: Functions That Belong to Objects
4.13 Scope Rules
Self Check
4.14 import: A Deeper Look
Self Check
4.15 Passing Arguments to Functions: A Deeper Look
Self Check
4.16 Function-Call Stack
Self Check
4.17 Functional-Style Programming
Pure Functions
4.18 Intro to Data Science: Measures of Dispersion
Self Check
4.19 Wrap-Up
Exercises
5 Sequences: Lists and Tuples
Objectives
Outline
5.1 Introduction
5.2 Lists
Self Check
5.3 Tuples
Self Check
5.4 Unpacking Sequences
Self Check
5.5 Sequence Slicing
Self Check
5.6 del Statement
Self Check
5.7 Passing Lists to Functions
Self Check
5.8 Sorting Lists
Self Check
5.9 Searching Sequences
Self Check
5.10 Other List Methods
Self Check
5.11 Simulating Stacks with Lists
Self Check
5.12 List Comprehensions
Self Check
5.13 Generator Expressions
Self Check
5.14 Filter, Map and Reduce
Self Check
5.15 Other Sequence Processing Functions
Self Check
5.16 Two-Dimensional Lists
Self Check
5.17 Intro to Data Science: Simulation and Static Visualizations
5.17.1 Sample Graphs for 600, 60,000 and 6,000,000 Die Rolls
Self Check
5.17.2 Visualizing Die-Roll Frequencies and Percentages
Launching IPython for Interactive Matplotlib Development
Importing the Libraries
Rolling the Die and Calculating Die Frequencies
Creating the Initial Bar Plot
Setting the Window Title and Labeling the x- and y-Axes
Finalizing the Bar Plot
Rolling Again and Updating the Bar Plot—Introducing IPython Magics
Saving Snippets to a File with the %save Magic
Command-Line Arguments; Displaying a Plot from a Script
Self Check
5.18 Wrap-Up
Exercises
Exercises 5.24 through 5.26 are reasonably challenging. Once you’ve done them, you ought to be able to implement many popular card games.
6 Dictionaries and Sets
Objectives
Outline
6.1 Introduction
6.2 Dictionaries
6.2.1 Creating a Dictionary
Determining if a Dictionary Is Empty
Self Check
6.2.2 Iterating through a Dictionary
Self Check
6.2.3 Basic Dictionary Operations
Accessing the Value Associated with a Key
Updating the Value of an Existing Key–Value Pair
Adding a New Key–Value Pair
Removing a Key–Value Pair
Attempting to Access a Nonexistent Key
Testing Whether a Dictionary Contains a Specified Key
Self Check
6.2.4 Dictionary Methods keys and values
Dictionary Views
Converting Dictionary Keys, Values and Key–Value Pairs to Lists
Processing Keys in Sorted Order
Self Check
6.2.5 Dictionary Comparisons
Self Check
6.2.6 Example: Dictionary of Student Grades
6.2.7 Example: Word Counts2
Python Standard Library Module collections
Self Check
6.2.8 Dictionary Method update
6.2.9 Dictionary Comprehensions
Self Check
6.3 Sets
Self Check
6.3.1 Comparing Sets
Self Check
6.3.2 Mathematical Set Operations
Union
Intersection
Difference
Symmetric Difference
Disjoint
Self Check
6.3.3 Mutable Set Operators and Methods
Mutable Mathematical Set Operations
Methods for Adding and Removing Elements
Self Check
6.3.4 Set Comprehensions
6.4 Intro to Data Science: Dynamic Visualizations
Self Check
6.4.1 How Dynamic Visualization Works
Animation Frames
Running RollDieDynamic.py
Sample Executions
Self Check
6.4.2 Implementing a Dynamic Visualization
Importing the Matplotlib animation Module
Function update
Function update: Rolling the Die and Updating the frequencies List
Function update: Configuring the Bar Plot and Text
Variables Used to Configure the Graph and Maintain State
Calling the animation Module’s Module’s FuncAnimation Function
Self Check
6.5 Wrap-Up
Exercises
7 Array-Oriented Programming with NumPy
Objectives
Outline
7.1 Introduction
Self Check
7.2 Creating arrays from Existing Data
Self Check
7.3 array Attributes
Self Check
7.4 Filling arrays with Specific Values
7.5 Creating arrays from Ranges
7.6 List vs. array Performance: Introducing %timeit
7.7 array Operators
7.8 NumPy Calculation Methods
7.9 Universal Functions
7.10 Indexing and Slicing
7.11 Views: Shallow Copies
7.12 Deep Copies
7.13 Reshaping and Transposing
7.14 Intro to Data Science: pandas Series and DataFrames
7.14.1 pandas Series
Creating a Series with Default Indices
Displaying a Series
Creating a Series with All Elements Having the Same Value
Accessing a Series’ Elements’ Elements
Producing Descriptive Statistics for a Series
Creating a Series with Custom Indices
Dictionary Initializers
Accessing Elements of a Series Via Custom Indices
Creating a Series of Strings
Self Check
7.14.2 DataFrames
Creating a DataFrame from a Dictionary
Customizing a DataFrame’s Indices with the ’s Indices with the index Attribute
Accessing a DataFrame’s Columns ’s Columns
Selecting Rows via the loc and iloc Attributes
Selecting Rows via Slices and Lists with the loc and iloc Attributes
Selecting Subsets of the Rows and Columns
Boolean Indexing
Accessing a Specific DataFrame Cell by Row and Column
Descriptive Statistics
Transposing the DataFrame with the T Attribute
Sorting by Rows by Their Indices
Sorting by Column Indices
Sorting by Column Values
Copy vs. In-Place Sorting
Self Check
7.15 Wrap-Up
Exercises
8 Strings: A Deeper Look
Objectives
Outline
8.1 Introduction
8.2 Formatting Strings
8.2.1 Presentation Types
Integers
Characters
Strings
Floating-Point and Decimal Values
Self Check
8.2.2 Field Widths and Alignment
Explicitly Specifying Left and Right Alignment in a Field
Centering a Value in a Field
Self Check
8.2.3Numeric Formatting
Formatting Positive Numbers with Signs
Using a Space Where a + Sign Would Appear in a Positive Value
Grouping Digits
Self Check
8.2.4String’s String’s format Method
Multiple Placeholders
Referencing Arguments By Position Number
Referencing Keyword Arguments
Self Check
8.3 Concatenating and Repeating Strings
8.4 Stripping Whitespace from Strings
8.5 Changing Character Case
8.6 Comparison Operators for Strings
8.7 Searching for Substrings
8.8 Replacing Substrings
8.9 Splitting and Joining Strings
8.10 Characters and Character-Testing Methods
8.11 Raw Strings
8.12 Introduction to Regular Expressions
8.12.1 re Module and Function fullmatch
Matching Literal Characters
Metacharacters, Character Classes and Quantifiers
Other Predefined Character Classes
Custom Character Classes
* vs. + Quantifier
Other Quantifiers
Self Check
8.12.2 Replacing Substrings and Splitting Strings
Function sub—Replacing Patterns —Replacing Patterns
Function split
Self Check
8.12.3 Other Search Functions; Accessing Matches
Function search—Finding the First Match Anywhere in a String—Finding the First Match Anywhere in a String
Ignoring Case with the Optional flags Keyword Argument
Metacharacters That Restrict Matches to the Beginning or End of a String
Function findall and finditer—Finding All Matches in a String—Finding All Matches in a String
Capturing Substrings in a Match
Self Check
8.13 Intro to Data Science: Pandas, Regular Expressions and Data Munging
Self Check
8.14 Wrap-Up
Exercises
Regular Expression Exercises
More Challenging String-Manipulation Exercises
9 Files and Exceptions
Objectives
Outline
9.1 Introduction
9.2 Files
9.3 Text-File Processing
9.3.1 Writing to a Text File: Introducing the with Statement
The with Statement
Built-In Function open
Writing to the File
Contents of accounts.txt File
Self Check
9.3.2 Reading Data from a Text File
File Method readlines
Seeking to a Specific File Position
Self Check
9.4 Updating Text Files
Self Check
9.5 Serialization with JSON
Self Check
9.6 Focus on Security: pickle Serialization and Deserialization
9.7 Additional Notes Regarding Files
Self Check
9.8 Handling Exceptions
9.8.1 Division by Zero and Invalid Input
Division By Zero
Invalid Input
9.8.2 try Statements
try Clause
except Clause
else Clause
Flow of Control for a ZeroDivisionError
Flow of Control for a ValueError
Flow of Control for a Successful Division
Self Check
9.8.3 Catching Multiple Exceptions in One except Clause
9.8.4 What Exceptions Does a Function or Method Raise?
9.8.5 What Code Should Be Placed in a try Suite?
9.9 finally Clause
Self Check
9.10 Explicitly Raising an Exception
Self Check
9.11 (Optional) Stack Unwinding and Tracebacks
Self Check
9.12 Intro to Data Science: Working with CSV Files
9.12.1 Python Standard Library Module csv
Writing to a CSV File
Reading from a CSV File
Caution: Commas in CSV Data Fields
Caution: Missing Commas and Extra Commas in CSV Files
Self Check
9.12.2 Reading CSV Files into Pandas DataFrames
Datasets
Working with Locally Stored CSV Files
9.12.3 Reading the Titanic Disaster Dataset
Loading the Titanic Dataset via a URL
Viewing Some of the Rows in the Titanic Dataset
Customizing the Column Names
9.12.4 Simple Data Analysis with the Titanic Disaster Dataset
9.12.5 Passenger Age Histogram
Self Check
9.13 Wrap-Up
Exercises
10 Object-Oriented Programming
Objectives
Outline
10.1 Introduction
10.2 Custom Class Account
10.2.1 Test-Driving Class Account
Importing Classes Account and Decimal
Create an Account Object with a Constructor Expression
Getting an Account’s Name and Balance’s Name and Balance
Depositing Money into an Account
Account Methods Perform Validation
Self Check
10.2.2 Account Class Definition
Defining a Class
Initializing Account Objects: Method __init__
Method deposit
10.2.3 Composition: Object References as Members of Classes
Self Check
10.3 Controlling Access to Attributes
Self Check
10.4 Properties for Data Access
10.4.1 Test-Driving Class Time
Creating a Time Object
Displaying a Time Object
Getting an Attribute Via a Property
Setting the Time
Setting an Attribute via a Property
Attempting to Set an Invalid Value
Self Check
10.4.2 Class Time Definition
Class Time: __init__ Method with Default Parameter Values
Class Time: hour Read-Write Property
Class Time: minute and second Read-Write Properties
Class Time: Method set_time
Class Time: Special Method __repr__
Class Time: Special Method __str__
Self Check
10.4.3 Class Time Definition Design Notes
Interface of a Class
Attributes Are Always Accessible
Internal Data Representation
Evolving a Class’s Implementation Details
Properties
Utility Methods
Module datetime
Self Check
10.5 Simulating “Private” AttributesSimulating “Private” Attributes
Self Check
10.6 Case Study: Card Shuffling and Dealing Simulation
10.6.1 Test-Driving Classes Card and DeckOfCards
Creating, Shuffling and Dealing the Cards
Dealing Cards
Class Card’s Other Features
10.6.2 Class Card—Introducing Class Attributes—Introducing Class Attributes
Class Attributes FACES and SUITS
Card Method __init__
Read-Only Properties face, suit and image_name
Methods That Return String Representations of a Card
10.6.3 Class DeckOfCards
Method __init__
Method shuffle
Method deal_card
Method __str__
10.6.4 Displaying Card Images with Matplotlib
Enable Matplotlib in IPython
Create the Base Path for Each Image
Import the Matplotlib Features
Create the Figure and Axes Objects
Configure the Axes Objects and Display the Images
Maximize the Image Sizes
Shuffle and Re-Deal the Deck
Self Check
10.7 Inheritance: Base Classes and Subclasses
Self Check
10.8 Building an Inheritance Hierarchy; Introducing Polymorphism
10.8.1 Base Class CommissionEmployee
All Classes Inherit Directly or Indirectly from Class object
Testing Class CommissionEmployee
Self Check
10.8.2 Subclass SalariedCommissionEmployee
Declaring Class SalariedCommissionEmployee
Inheriting from Class CommissionEmployee
Method __init__ and Built-In Function super
Overriding Method earnings
Overriding Method __repr__
Testing Class SalariedCommissionEmployee
Testing the “is a” Relationship
Self Check
10.8.3 Processing CommissionEmployees and SalariedCommissionEmployees Polymorphically
Self Check
10.8.4A Note About Object-Based and Object-Oriented Programming
10.9 Duck Typing and Polymorphism
10.10 Operator Overloading
Operator Overloading Restrictions
Complex Numbers
10.10.1 Test-Driving Class Complex
10.10.2 Class Complex Definition
Method __init__
Overloaded + Operator
Overloaded += Augmented Assignment
Method __repr__
Self Check
10.11 Exception Class Hierarchy and Custom Exceptions
10.12 Named Tuples
Self Check
10.13 A Brief Intro to Python 3.7’s New Data ClassesA Brief Intro to Python 3.7’s New Data Classes
10.13.1 Creating a Card Data Class
Importing from the dataclasses and typing Modules
Using the @dataclass Decorator
Variable Annotations: Class Attributes
Variable Annotations: Data Attributes
Defining a Property and Other Methods
Variable Annotation Notes
Self Check
10.13.2 Using the Card Data Class
Self Check
10.13.3 Data Class Advantages over Named Tuples
10.13.4 Data Class Advantages over Traditional Classes
More Information
10.14 Unit Testing with Docstrings and doctest
Self Check
10.15 Namespaces and Scopes
10.16 Intro to Data Science: Time Series and Simple Linear Regression
Self Check
10.17 Wrap-Up
Exercises
11 Computer Science Thinking: Recursion, Searching, Sorting and Big O
Objectives
Outline
11.1 Introduction
11.2 Factorials
11.3 Recursive Factorial Example
Self Check
11.4 Recursive Fibonacci Series Example
Self Check
11.5 Recursion vs. Iteration
11.6 Self Check
11.6 Searching and Sorting
11.7 Linear Search
Self Check
11.8 Efficiency of Algorithms: Big O
Self Check
11.9 Binary Search
Self Check
11.9.1 Binary Search Implementation
Function binary_search
Function remaining_elements
Function main
11.9.2 Big O of the Binary Search
11.10 Sorting Algorithms
11.11 Selection Sort
11.11.1 Selection Sort Implementation
Function selection_sort
Function main
11.11.2 Utility Function print_pass
11.11.3 Big O of the Selection Sort
Self Check
11.12 Insertion Sort
11.12.1 Insertion Sort Implementation
Function insertion_sort
11.12.2 Big O of the Insertion Sort
Self Check
11.13 Merge Sort
11.13.1 Merge Sort Implementation
Function merge_sort
Recursive Function sort_array
Function merge
Function subarray_string
Function main
11.13.2 Big O of the Merge Sort
Self Check
11.14 Big O Summary for This Chapter’s Searching and Sorting AlgorithmsBig O Summary for This Chapter’s Searching and Sorting Algorithms
11.15 Visualizing Algorithms
11.15.1 Generator Functions
yield Statements
11.15.2 Implementing the Selection Sort Animation
import Statements
update Function That Displays Each Animation Frame
flash_bars Function That Flashes the Bars About to Be Swapped
selection_sort Generator Function
main Function That Launches the Animation
Sound Utility Functions
11.16 Wrap-Up
Exercises
12 Natural Language Processing (NLP)
Objectives
Outline
12.1 Introduction
12.2 TextBlob1
Self Check
12.2.1 Create a TextBlob
Self Check
12.2.2 Tokenizing Text into Sentences and Words
Self Check
12.2.3 Parts-of-Speech Tagging
Self Check
12.2.4 Extracting Noun Phrases
Self Check
12.2.5 Sentiment Analysis with TextBlob’s Default Sentiment AnalyzerSentiment Analysis with TextBlob’s Default Sentiment Analyzer
Getting the Sentiment of a TextBlob
Getting the polarity and subjectivity from the Sentiment Object
Getting the Sentiment of a Sentence
Self Check
12.2.6 Sentiment Analysis with the NaiveBayesAnalyzer
Self Check
12.2.7 Language Detection and Translation
Self Check
12.2.8 Inflection: Pluralization and Singularization
Self Check
12.2.9 Spell Checking and Correction
Self Check
12.2.10 Normalization: Stemming and Lemmatization
Self Check
12.2.11 Word Frequencies
Self Check
12.2.12 Getting Definitions, Synonyms and Antonyms from WordNet
Getting Definitions
Getting Synonyms
Getting Antonyms
Self Check
12.2.13 Deleting Stop Words
Self Check
12.2.14 n-grams
Self Check
12.3 Visualizing Word Frequencies with Bar Charts and Word Clouds
12.3.1 Visualizing Word Frequencies with Pandas
Loading the Data
Getting the Word Frequencies
Eliminating the Stop Words
Sorting the Words by Frequency
Getting the Top 20 Words
Convert top20 to a DataFrame
Visualizing the DataFrame
12.3.2 Visualizing Word Frequencies with Word Clouds
Installing the wordcloud Module
Loading the Text
Loading the Mask Image that Specifies the Word Cloud’s Shape
Configuring the WordCloud Object
Generating the Word Cloud
Saving the Word Cloud as an Image File
Generating a Word Cloud from a Dictionary
Displaying the Image with Matplotlib
Self Check
12.4 Readability Assessment with Textatistic
Self Check
12.5 Named Entity Recognition with spaCy
Self Check
12.6 Similarity Detection with spaCy
Self Check
12.7 Other NLP Libraries and Tools
12.8 Machine Learning and Deep Learning Natural Language Applications
12.9 Natural Language Datasets
12.10 Wrap-Up
Exercises
13 Data Mining Twitter
Objectives
Outline
13.1 Introduction
Self Check
13.2 Overview of the Twitter APIs
Self Check
13.3 Creating a Twitter Account
13.4 Getting Twitter Credentials—Creating an AppGetting Twitter Credentials—Creating an App
Self Check
13.5 What’s in a Tweet?What’s in a Tweet?
Key Properties of a Tweet Object
Sample Tweet JSON
Twitter JSON Object Resources
Self Check
13.6 Tweepy
13.7 Authenticating with Twitter Via Tweepy
Self Check
13.8 Getting Information About a Twitter Account
Self Check
13.9 Introduction to Tweepy Cursors: Getting an Account’s Followers and Friendss: Getting an Account’s Followers and Friends
13.9.1 Determining an Account’s Followers Determining an Account’s Followers
Creating a Cursor
Getting Results
Automatic Paging
Getting Follower IDs Rather Than Followers
Self Check
13.9.2 Determining Whom an Account Follows
Self Check
13.9.3 Getting a User’s Recent TweetsGetting a User’s Recent Tweets
Grabbing Recent Tweets from Your Own Timeline
Self Check
13.10 Searching Recent Tweets
13.11 Spotting Trends: Twitter Trends API
13.11.1 Places with Trending Topics
Self Check
13.11.2 Getting a List of Trending Topics
Worldwide Trending Topics
New York City Trending Topics
Self Check
13.11.3 Create a Word Cloud from Trending Topics
Self Check
13.12 Cleaning/Preprocessing Tweets for Analysis
Self Check
13.13 Twitter Streaming API
13.13.1 Creating a Subclass of StreamListener
Class TweetListener
Class TweetListener: __init__ Method
Class TweetListener: on_connect Method
Class TweetListener: on_status Method
13.13.2 Initiating Stream Processing
Authenticating
Creating a TweetListener
Creating a Stream
Starting the Tweet Stream
Asynchronous vs. Synchronous Streams
Other filter Method Parameters
Twitter Restrictions Note
Self Check
13.14 Tweet Sentiment Analysis
13.15 Geocoding and Mapping
Self Check
13.15.1 Getting and Mapping the Tweets
Get the API Object
Collections Required By LocationListener
Creating the LocationListener
Configure and Start the Stream of Tweets
Displaying the Location Statistics
Geocoding the Locations
Displaying the Bad Location Statistics
Cleaning the Data
Creating a Map with Folium
Creating Popup Markers for the Tweet Locations
Saving the Map
Self Check
13.15.2 Utility Functions in tweetutilities.py
get_tweet_content Utility Function
get_geocodes Utility Function
Self Check
13.15.3 Class LocationListener
13.16 Ways to Store Tweets
13.17 Twitter and Time Series
13.18 Wrap-Up
Exercises
14 IBM Watson and Cognitive Computing
Outline
14.1 Introduction: IBM Watson and Cognitive Computing
Self Check
14.2 IBM Cloud Account and Cloud Console
Self Check
14.3 Watson Services
Watson Assistant
Visual Recognition
Speech to Text
Text to Speech
Language Translator
Natural Language Understanding
Discovery
Personality Insights
Tone Analyzer
Natural Language Classifier
Synchronous and Asynchronous Capabilities
Self Check
14.4 Additional Services and Tools
Watson Studio
Knowledge Studio
Machine Learning
Knowledge Catalog
Cognos Analytics
Self Check
14.5 Watson Developer Cloud Python SDK
Modules We’ll Need for Audio Recording and Playback
SDK Examples
Self Check
14.6 Case Study: Traveler’s Companion Translation AppCase Study: Traveler’s Companion Translation App
Self Check
14.6.1 Before You Run the App
Registering for the Speech to Text Service
Registering for the Text to Speech Service
Registering for the Language Translator Service
Retrieving Your Credentials
Self Check
14.6.2 Test-Driving the App
Processing the Question
Processing the Response
Self Check
14.6.3 SimpleLanguageTranslator.py Script Walkthrough
Importing Watson SDK Classes
Other Imported Modules
Main Program: Function run_translator
Function speech_to_text
Function translate
Function text_to_speech
Function record_audio
Function play_audio
Executing the run_translator Function
Self Check
14.7 Watson Resources
Self Check
14.8 Wrap-Up
Exercises
15 Machine Learning: Classification, Regression and Clustering
Outline
15.1 Introduction to Machine Learning
15.1.1 Scikit-Learn
Which Scikit-Learn Estimator Should You Choose for Your Project
15.1.2 Types of Machine Learning
Supervised Machine Learning
Datasets
Classification
Regression
Unsupervised Machine Learning
K-Means Clustering and the Iris Dataset
Big Data and Big Computer Processing Power
15.1.3 Datasets Bundled with Scikit-Learn
15.1.4 Steps in a Typical Data Science Study
Self Check
15.2 Case Study: Classification with k-Nearest Neighbors and the Digits Dataset, Part 1
Self Check
15.2.1 k-Nearest Neighbors Algorithm
Hyperparameters and Hyperparameter Tuning
Self Check
15.2.2 Loading the Dataset
Displaying the Description
Checking the Sample and Target Sizes
A Sample Digit Image
Preparing the Data for Use with Scikit-Learn
Self Check
15.2.3 Visualizing the Data
Creating the Diagram
Displaying Each Image and Removing the Axes Labels
Self Check
15.2.4 Splitting the Data for Training and Testing
Training and Testing Set Sizes
Self Check
15.2.5 Creating the Model
15.2.6 Training the Model
Self Check
15.2.7 Predicting Digit Classes
Self Check
15.3 Case Study: Classification with k-Nearest Neighbors and the Digits Dataset, Part 2
15.3.1 Metrics for Model Accuracy
Estimator Method score
Confusion Matrix
Classification Report
Visualizing the Confusion Matrix
Self Check
15.3.2 K-Fold Cross-Validation
KFold Class
Using the KFold Object with Function cross_val_score
Self Check
15.3.3 Running Multiple Models to Find the Best One
Scikit-Learn Estimator Diagram
Self Check
15.3.4 Hyperparameter Tuning
Self Check
15.4 Case Study: Time Series and Simple Linear Regression
Self Check
15.5 Case Study: Multiple Linear Regression with the California Housing Dataset
15.5.1 Loading the Dataset
Loading the Data
Displaying the Dataset’s Description
15.5.2 Exploring the Data with Pandas
Self Check
15.5.3 Visualizing the Features
Self Check
15.5.4 Splitting the Data for Training and Testing
15.5.5 Training the Model
Self Check
15.5.6 Testing the Model
15.5.7 Visualizing the Expected vs. Predicted Prices
15.5.8 Regression Model Metrics
Self Check
15.5.9 Choosing the Best Model
15.6 Case Study: Unsupervised Machine Learning, Part 1—Dimensionality ReductionCase Study: Unsupervised Machine Learning, Part 1—Dimensionality Reduction
Loading the Digits Dataset
Creating a TSNE Estimator for Dimensionality Reduction
Transforming the Digits Dataset’s Features into Two Dimensions
Visualizing the Reduced Data
Visualizing the Reduced Data with Different Colors for Each Digit
Self Check
15.7 Case Study: Unsupervised Machine Learning, Part 2—k-Means ClusteringCase Study: Unsupervised Machine Learning, Part 2—k-Means Clustering
Self Check
15.7.1 Loading the Iris Dataset
Checking the Numbers of Samples, Features and Targets
15.7.2 Exploring the Iris Dataset: Descriptive Statistics with Pandas
15.7.3 Visualizing the Dataset with a Seaborn pairplot
Displaying the pairplot in One Color
Self Check
15.7.4 Using a KMeans Estimator
Creating the Estimator
Fitting the Model
Comparing the Computer Cluster Labels to the Iris Dataset’s Target Values
Self Check
15.7.5 Dimensionality Reduction with Principal Component Analysis
Creating the PCA Object
Transforming the Iris Dataset’s Features into Two Dimensions
Visualizing the Reduced Data
Self Check
15.7.6 Choosing the Best Clustering Estimator
15.8 Wrap-Up
Exercises
16 Deep Learning
Objectives
Outline
16.1 Introduction
Self Check
16.1.1 Deep Learning Applications
16.1.2 Deep Learning Demos
16.1.3 Keras Resources
16.2 Keras Built-In Datasets
16.3 Custom Anaconda Environments
Self Check
16.4 Neural Networks
Self Check
16.5 Tensors
Self Check
16.6 Convolutional Neural Networks for Vision; Multi-Classification with the MNIST Dataset
Self Check
16.6.1 Loading the MNIST Dataset
Self Check
16.6.2 Data Exploration
Visualizing Digits
16.6.3 Data Preparation
Reshaping the Image Data
Normalizing the Image Data
One-Hot Encoding: Converting the Labels From Integers to Categorical Data
Self Check
16.6.4 Creating the Neural Network
Adding Layers to the Network
Convolution
Adding a Convolution Layer
Dimensionality of the First Convolution Layer’s Output
Overfitting
Adding a Pooling Layer
Adding Another Convolutional Layer and Pooling Layer
Flattening the Results
Adding a Dense Layer to Reduce the Number of Features
Adding Another Dense Layer to Produce the Final Output
Printing the Model’s Summary
Visualizing a Model’s Structure
Compiling the Model
Self Check
16.6.5 Training and Evaluating the Model
Evaluating the Model
Making Predictions
Locating the Incorrect Predictions
Visualizing Incorrect Predictions
Displaying the Probabilities for Several Incorrect Predictions
Self Check
16.6.6 Saving and Loading a Model
Self Check
16.7 Visualizing Neural Network Training with TensorBoard
Self Check
16.8 ConvnetJS: Browser-Based Deep-Learning Training and Visualization
16.9 Recurrent Neural Networks for Sequences; Sentiment Analysis with the IMDb Dataset
Self Check
16.9.1 Loading the IMDb Movie Reviews Dataset
Self Check
16.9.2 Data Exploration
Movie Review Encodings
Decoding a Movie Review
16.9.3 Data Preparation
Splitting the Test Data into Validation and Test Data
Self Check
16.9.4 Creating the Neural Network
Adding an Embedding Layer
Adding an LSTM Layer
Adding a Dense Output Layer
Compiling the Model and Displaying the Summary
Self Check
16.9.5 Training and Evaluating the Model
16.10 Tuning Deep Learning Models
Self Check
16.11 Convnet Models Pretrained on ImageNet
16.12 Reinforcement Learning
16.12.1 Deep Q-Learning
16.12.2 OpenAI Gym
16.13 Wrap-Up
Exercises
Convolutional Neural Networks
Recurrent Neural Networks
ConvnetJS Visualization
Convolutional Neural Network Projects and Research
Recurrent Neural Network Projects and Research
Automated Deep Learning Project
Reinforcement Learning Projects and Research
Generative Deep Learning
Deep Fakes
Additional Research
17 Big Data: Hadoop, Spark, NoSQL and IoT
Objectives
Outline
17.1 Introduction
Self Check for Section 17.1
17.2 Relational Databases and Structured Query Language (SQL)
Self Check
17.2.1 A books Database
Self Check
17.2.2 SELECT Queries
17.2.3 WHERE Clause
Pattern Matching: Zero or More Characters
Pattern Matching: Any Character
Self Check
17.2.4 ORDER BY Clause
Sorting By Multiple Columns
Combining the WHERE and ORDER BY Clauses
Self Check
17.2.5 Merging Data from Multiple Tables: INNER JOIN
Self Check
17.2.6 INSERT INTO Statement
Note Regarding Strings That Contain Single Quotes
17.2.7 UPDATE Statement
17.2.8 DELETE FROM Statement
Self Check for Section 17.2
17.3 NoSQL and NewSQL Big-Data Databases: A Brief Tour
17.3.1 NoSQL Key–Value DatabasesNoSQL Key–Value Databases
17.3.2 NoSQL Document Databases
17.3.3 NoSQL Columnar Databases
17.3.4 NoSQL Graph Databases
17.3.5 NewSQL Databases
Self Check for Section 17.3
17.4 Case Study: A MongoDB JSON Document Database
17.4.1 Creating the MongoDB Atlas Cluster
Creating Your First Database User
Whitelist Your IP Address
Connect to Your Cluster
17.4.2 Streaming Tweets into MongoDB
Use Tweepy to Authenticate with Twitter
Loading the Senators’ Data
Configuring the MongoClient
Setting up Tweet Stream
Starting the Tweet Stream
Class TweetListener
Counting Tweets for Each Senator
Show Tweet Counts for Each Senator
Get the State Locations for Plotting Markers
Grouping the Tweet Counts by State
Creating the Map
Creating a Choropleth to Color the Map
Creating the Map Markers for Each State
Displaying the Map
Self Check for Section 17.4
17.5 Hadoop
17.5.1 Hadoop Overview
HDFS, MapReduce and YARN
Hadoop Ecosystem
Hadoop Providers
Hadoop 3
17.5.2 Summarizing Word Lengths in Romeo and Juliet via MapReduce
17.5.3 Creating an Apache Hadoop Cluster in Microsoft Azure HDInsight
Creating an HDInsight Hadoop Cluster
17.5.4 Hadoop Streaming
17.5.5 Implementing the Mapper
17.5.6 Implementing the Reducer
17.5.7 Preparing to Run the MapReduce Example
Copying the Script Files to the HDInsight Hadoop Cluster
Copying RomeoAndJuliet into the Hadoop File System
17.5.8 Running the MapReduce Job
Viewing the Word Counts
Deleting Your Cluster So You Do Not Incur Charges
Self Check for Section 17.5
17.6 Spark
17.6.1 Spark Overview
History
Architecture and Components
Providers
17.6.2 Docker and the Jupyter Docker Stacks
Docker
Installing Docker
Jupyter Docker Stacks
Run Jupyter Docker Stack
Opening JupyterLab in Your Browser
Accessing the Docker Container’s Command Line
Stopping and Restarting a Docker Container
17.6.3 Word Count with Spark
Loading the NLTK Stop Words
Configuring a SparkContext
Reading the Text File and Mapping It to Words
Removing the Stop Words
Counting Each Remaining Word
Locating Words with Counts Greater Than or Equal to 60
Sorting and Displaying the Results
17.6.4 Spark Word Count on Microsoft Azure
Create an Apache Spark Cluster in HDInsight Using the Azure Portal
Install Libraries into a Cluster
Copying RomeoAndJuliet.txt to the HDInsight Cluster
Accessing Jupyter Notebooks in HDInsight
Uploading the RomeoAndJulietCounter.ipynb Notebook
Modifying the Notebook to Work with Azure
Self Check for Section 17.6
17.7 Spark Streaming: Counting Twitter Hashtags Using the pyspark-notebook Docker Stack
17.7.1 Streaming Tweets to a Socket
Executing the Script in the Docker Container
starttweetstream.py import Statements
Class TweetListener
Main Application
17.7.2 Summarizing Tweet Hashtags; Introducing Spark SQL
Importing the Libraries
Utility Function to Get the SparkSession
Utility Function to Display a Barchart Based on a Spark DataFrame
Utility Function to Summarize the Top-20 Hashtags So Far
Getting the SparkContext
Getting the StreamingContext
Setting Up a Checkpoint for Maintaining State
Connecting to the Stream via a Socket
Tokenizing the Lines of Hashtags
Mapping the Hashtags to Tuples of Hashtag-Count Pairs
Totaling the Hashtag Counts So Far
Specifying the Method to Call for Every RDD
Starting the Spark Stream
Self Check for Section 17.7
17.8 Internet of Things and Dashboards
17.8.1 Publish and Subscribe
17.8.2 Visualizing a PubNub Sample Live Stream with a Freeboard Dashboard
Signing up for Freeboard.io
Creating a New Dashboard
Adding a Data Source
Adding a Pane for the Humidity Sensor
Adding a Gauge to the Humidity Pane
Adding a Sparkline to the Humidity Pane
Completing the Dashboard
17.8.3 Simulating an Internet-Connected Thermostat in Python
Installing Dweepy
Invoking the simulator.py Script
Sending Dweets
17.8.4 Creating the Dashboard with Freeboard.io
17.8.5 Creating a Python PubNub Subscriber
Message Format
Importing the Libraries
List and DataFrame Used for Storing Company Names and Prices
Class SensorSubscriberCallback
Function Update
Configuring the Figure
Configuring the FuncAnimation and Displaying the Window
Configuring the PubNub Client
Subscribing to the Channel
Ensuring the Figure Remains on the Screen
Self Check for Section 17.8
17.9 Wrap-Up
Exercises
SQL and RDBMS Exercises
NoSQL Database Exercises
Hadoop Exercises
Spark Exercises
IoT and Pub/Sub Exercises
Platform Exercises
Other Exercises
Index
← Prev
Back
Next →
← Prev
Back
Next →