Data Science For Dummies®
To view this book's Cheat Sheet, simply go to
www.dummies.com
and search for “Data Science For Dummies Cheat Sheet” in the Search box.
Table of Contents
Cover
Title Page
Copyright
Introduction
About This Book
Foolish Assumptions
Icons Used in This Book
Beyond the Book
Where to Go from Here
Part 1: Getting Started with Data Science
Chapter 1: Wrapping Your Head Around Data Science
Seeing Who Can Make Use of Data Science
Inspecting the Pieces of the Data Science Puzzle
Exploring Career Alternatives That Involve Data Science
Chapter 2: Tapping into Critical Aspects of Data Engineering
Defining Big Data and the Three Vs
Identifying Important Data Sources
Grasping the Differences among Data Approaches
Storing and Processing Data for Data Science
Part 2: Using Data Science to Extract Meaning from Your Data
Chapter 3: Machine Learning Means … Using a Machine to Learn from Data
Defining Machine Learning and Its Processes
Considering Learning Styles
Seeing What You Can Do
Chapter 4: Math, Probability, and Statistical Modeling
Exploring Probability and Inferential Statistics
Quantifying Correlation
Reducing Data Dimensionality with Linear Algebra
Modeling Decisions with Multiple Criteria Decision-Making
Introducing Regression Methods
Detecting Outliers
Introducing Time Series Analysis
Chapter 5: Grouping Your Way into Accurate Predictions
Starting with Clustering Basics
Identifying Clusters in Your Data
Categorizing Data with Decision Tree and Random Forest Algorithms
Drawing a Line between Clustering and Classification
Making Sense of Data with Nearest Neighbor Analysis
Classifying Data with Average Nearest Neighbor Algorithms
Classifying with K-Nearest Neighbor Algorithms
Solving Real-World Problems with Nearest Neighbor Algorithms
Chapter 6: Coding Up Data Insights and Decision Engines
Seeing Where Python and R Fit into Your Data Science Strategy
Using Python for Data Science
Using Open Source R for Data Science
Chapter 7: Generating Insights with Software Applications
Choosing the Best Tools for Your Data Science Strategy
Getting a Handle on SQL and Relational Databases
Investing Some Effort into Database Design
Narrowing the Focus with SQL Functions
Making Life Easier with Excel
Chapter 8: Telling Powerful Stories with Data
Data Visualizations: The Big Three
Designing to Meet the Needs of Your Target Audience
Picking the Most Appropriate Design Style
Selecting the Appropriate Data Graphic Type
Testing Data Graphics
Adding Context
Part 3: Taking Stock of Your Data Science Capabilities
Chapter 9: Developing Your Business Acumen
Bridging the Business Gap
Traversing the Business Landscape
Surveying Use Cases and Case Studies
Chapter 10: Improving Operations
Establishing Essential Context for Operational Improvements Use Cases
Exploring Ways That Data Science Is Used to Improve Operations
Chapter 11: Making Marketing Improvements
Exploring Popular Use Cases for Data Science in Marketing
Turning Web Analytics into Dollars and Sense
Building Data Products That Increase Sales-and-Marketing ROI
Increasing Profit Margins with Marketing Mix Modeling
Chapter 12: Enabling Improved Decision-Making
Improving Decision-Making
Barking Up the Business Intelligence Tree
Using Data Analytics to Support Decision-Making
Increasing Profit Margins with Data Science
Chapter 13: Decreasing Lending Risk and Fighting Financial Crimes
Decreasing Lending Risk with Clustering and Classification
Preventing Fraud Via Natural Language Processing (NLP)
Chapter 14: Monetizing Data and Data Science Expertise
Setting the Tone for Data Monetization
Monetizing Data Science Skills as a Service
Selling Data Products
Direct Monetization of Data Resources
Pricing Out Data Privacy
Part 4: Assessing Your Data Science Options
Chapter 15: Gathering Important Information about Your Company
Unifying Your Data Science Team Under a Single Business Vision
Framing Data Science around the Company’s Vision, Mission, and Values
Taking Stock of Data Technologies
Inventorying Your Company’s Data Resources
People-Mapping
Avoiding Classic Data Science Project Pitfalls
Tuning In to Your Company’s Data Ethos
Making Information-Gathering Efficient
Chapter 16: Narrowing In on the Optimal Data Science Use Case
Reviewing the Documentation
Selecting Your Quick-Win Data Science Use Cases
Picking between Plug-and-Play Assessments
Chapter 17: Planning for Future Data Science Project Success
Preparing an Implementation Plan
Supporting Your Data Science Project Plan
Executing On Your Data Science Project Plan
Chapter 18: Blazing a Path to Data Science Career Success
Navigating the Data Science Career Matrix
Landing Your Data Scientist Dream Job
Leading with Data Science
Starting Up in Data Science
Part 5: The Part of Tens
Chapter 19: Ten Phenomenal Resources for Open Data
Digging Through data.gov
Checking Out Canada Open Data
Diving into data.gov.uk
Checking Out US Census Bureau Data
Accessing NASA Data
Wrangling World Bank Data
Getting to Know Knoema Data
Queuing Up with Quandl Data
Exploring Exversion Data
Mapping OpenStreetMap Spatial Data
Chapter 20: Ten Free or Low-Cost Data Science Tools and Applications
Scraping, Collecting, and Handling Data Tools
Data-Exploration Tools
Designing Data Visualizations
Communicating with Infographics
Index
About the Author
Advertisement Page
Connect with Dummies
End User License Agreement
List of Tables
Chapter 5
TABLE 5-1: Business Analyst Stu’s Employee Data
Chapter 6
TABLE 6-1: Popular Operators
Chapter 8
TABLE 8-1: Types of Data Visualization, by Audience
Chapter 10
TABLE 10-1 Operational Improvement Use Cases for Data Science, by Industry
TABLE 10-2 Theoretical Business Use Case for Automating Client Interactions
TABLE 10-3 Theoretical Technology Stack for Automating Client Interactions
TABLE 10-4: Theoretical Business Use Case for Real-Time Optimized Logistics Rout...
TABLE 10-5 Theoretical Technology Stack for Real-Time Optimized Logistics Routin...
TABLE 10-6 Theoretical Business Use Case for Increasing Content-Generation Rates
TABLE 10-7 Theoretical Technology Stack for Increasing Content Generation Rates
Chapter 12
TABLE 12-1 Theoretical Business Use Case for Analyzing Customer Sentiment
TABLE 12-2 Theoretical Business Use Case for Real-Time Decision Support
TABLE 12-3 Theoretical Technology Stack for Automating Client Interactions
Chapter 13
TABLE 13-1: An Example of
n-
grams in Phrase and Entity Analysis
List of Illustrations
Chapter 2
FIGURE 2-1: Popular sources of big data.
Chapter 3
FIGURE 3-1: A example of a simple random sample
FIGURE 3-2: Unsupervised machine learning breaks down unlabeled data into subgr...
FIGURE 3-3: Machine learning algorithms can be broken down by function.
FIGURE 3-4: Neural networks are connected layers of artificial neural units.
FIGURE 3-5: A deep learning network is a neural network with more than one hidd...
Chapter 4
FIGURE 4-1: An example of a linear relationship between months and YouTube subs...
FIGURE 4-2: An example of a non-linear relationship between watch time and % vi...
FIGURE 4-3: Applying SVD to compress a sparse, clean dataset.
FIGURE 4-4: Applying SVD to clean and compress a sparse, dirty dataset.
FIGURE 4-5: You can use SVD to decompose data down to u, S, and V matrices.
FIGURE 4-6: Linear regression used to predict home prices based on the number o...
FIGURE 4-7: Spotting outliers with a Tukey boxplot.
FIGURE 4-8: Using PCA to spot outliers.
FIGURE 4-9: A comparison of patterns exhibited by time series.
FIGURE 4-10 An example of an ARMA forecast model.
Chapter 5
FIGURE 5-1: A simple scatterplot.
FIGURE 5-2: A simple scatterplot, showing eyeballed estimations of clustering.
FIGURE 5-3: KDE smoothing of the World Bank’s Income and Education data scatter...
FIGURE 5-4: A schematic layout of a sample dendrogram.
FIGURE 5-5: Using DBScan to detect outliers (in black) within the Iris dataset....
FIGURE 5-6: A decision tree model predicts survival rates from the
Titanic
cata...
FIGURE 5-7: Using the Continent feature to classify World Bank data.
FIGURE 5-8: The distances between the employees’ tuples.
FIGURE 5-9: Finding the average similarity between employees.
FIGURE 5-10: How kNN works.
Chapter 6
FIGURE 6-1: Sample output from Python’s MatPlotLib library.
FIGURE 6-2: Time-series plot of monthly snow depth data.
FIGURE 6-3: The relationship between atomic vectors, lists, and data frame obje...
FIGURE 6-4: Linear regression coefficients from R, translated into a plain math...
FIGURE 6-5: A scatterplot, generated in the ggplot2 package.
FIGURE 6-6: A network diagram that was generated using the statnet package.
Chapter 7
FIGURE 7-1: An example of how SQL is human-readable.
FIGURE 7-2: A relationship between data tables that share a column.
FIGURE 7-3: The full dataset that tracks employee sales performance.
FIGURE 7-4: The sales performance dataset, filtered to show only Abbie’s record...
FIGURE 7-5: Spotting outliers in a tabular dataset with conditional formatting ...
FIGURE 7-6: Spotting outliers in a tabular dataset with color scales.
FIGURE 7-7: Excel XY (scatter) plots provide a simple way to visually detect ou...
FIGURE 7-8: Excel line charts make it easy to visually detect trends in data.
FIGURE 7-9: A long dataset and a wide spreadsheet.
FIGURE 7-10: Creating a wide data table from the long dataset via a PivotTable.
FIGURE 7-11: Using a macro to insert empty cells between values.
Chapter 8
FIGURE 8-1: This design style conveys a calculating and exacting feel.
FIGURE 8-2: This design style is intended to evoke an emotional response.
FIGURE 8-3: Data visualization versus data graphics.
FIGURE 8-4: Types of data graphics, broken down by audience and data visualizat...
FIGURE 8-5: An area chart in three dimensions.
FIGURE 8-6: A bar chart showing the area of US states by their acreage, in thou...
FIGURE 8-7: A line chart.
FIGURE 8-8: A pie chart.
FIGURE 8-9: A bubble chart.
FIGURE 8-10: A packed circle diagram.
FIGURE 8-11: A Gantt chart.
FIGURE 8-12: A stacked chart.
FIGURE 8-13: A tree map.
FIGURE 8-14: A simple word cloud.
FIGURE 8-15: A histogram.
FIGURE 8-16: A scatterplot.
FIGURE 8-17: A scatterplot matrix.
FIGURE 8-18: A linear topology.
FIGURE 8-19: A graph mesh network topology.
FIGURE 8-20: A hierarchical tree topology.
FIGURE 8-21: A Cloropleth map.
FIGURE 8-22: A point map.
FIGURE 8-23: A raster surface map.
FIGURE 8-24: Here you see the importance of selecting effective data graphics.
FIGURE 8-25: Using annotation to create context.
FIGURE 8-26: Using graphical elements to create context.
Chapter 9
FIGURE 9-1: The data science Venn diagram.
FIGURE 9-2: The mechanics by which traditional data roles increase business pro...
FIGURE 9-3: The three layers of the business acumen cake.
FIGURE 9-4: My STAR framework, for managing profit-forming data science project...
FIGURE 9-5: Five routes by which data science impacts business.
FIGURE 9-6: A simple business use case diagram.
Chapter 10
FIGURE 10-1: AI-enabled robotics automation.
FIGURE 10-2: Diagram of a debt-recovery business use case.
FIGURE 10-3: Diagram of a logistics-routing business use case.
FIGURE 10-4: Diagram of a content publishing business use case.
Chapter 11
FIGURE 11-1: A channel scorecard.
Chapter 12
FIGURE 12-1: The data-to-dollars flowchart.
FIGURE 12-2: The business intelligence path within the data-to-dollars flowchar...
FIGURE 12-3: An example of a modern, SaaS BI tool.
FIGURE 12-4: The analytics path within the data-to-dollars flowchart.
FIGURE 12-5: An example of analytics that report on real-time data.
FIGURE 12-6: An example of analytics that report on external data.
FIGURE 12-7: The data science path within the data-to-dollars flowchart.
FIGURE 12-8: A business use case diagram for automated decision support AI for ...
Chapter 13
FIGURE 13-1: NLP enabled affective computing.
FIGURE 13-2: Sentiment analysis flowchart.
FIGURE 13-3: An example of raw data ingested into an NLP system.
FIGURE 13-4: An example of text normalization.
FIGURE 13-5: An example of text tokenization.
FIGURE 13-6: Forrester's estimated ROI for a client.
Chapter 14
FIGURE 14-1: The three main ways to monetize data.
FIGURE 14-2: The value proposition of personal data in advertising.
Chapter 15
FIGURE 15-1: The classic power dynamic within a data science team.
FIGURE 15-2: The need for business vision, mission, and values.
FIGURE 15-3: My STAR framework, for managing profit-forming data science projec...
FIGURE 15-4: An example of data architecture.
FIGURE 15-5: A list of popular data technologies, broken down according to func...
FIGURE 15-6: The structure of a data dictionary.
FIGURE 15-7: The relationship between a dataset and the data dictionary that de...
FIGURE 15-8: An example of data quality and silo issues.
FIGURE 15-9: An organizational chart.
FIGURE 15-10: Some foundations from which great data professionals emerge.
FIGURE 15-11: Questions and recipients inside my question-and-asset request dat...
FIGURE 15-12: Some filtered questions, ready to send to relevant finance person...
Chapter 16
FIGURE 16-1: Assessing your company’s current state.
Chapter 17
FIGURE 17-1: Recommending a plan of action.
FIGURE 17-2: An example of reference architecture.
Chapter 18
FIGURE 18-1: Potential roles that can take you from entry-level to senior-level...
FIGURE 18-2: Potential roles that can take you from entry-level to senior-level...
FIGURE 18-3: Potential roles that can take you from entry-level to senior-level...
FIGURE 18-4: The four types of currency.
FIGURE 18-5: Data scientists, broken down by degree types.
FIGURE 18-6: Data scientists, broken down by field of study.
FIGURE 18-7: A map from within Jake Vanderplas’s instructional coding demonstra...
FIGURE 18-8: The four best business models for remote, self-funded data science...
FIGURE 18-9: The four best business models for remote, self-funded data science...
Chapter 19
FIGURE 19-1: The index of insect records in Knoema’s search.
FIGURE 19-2: The index of Apple records in a Quandl search.
Chapter 20
FIGURE 20-1: An ImageQuilts output from the Google Images search term
data scie
...
FIGURE 20-2: A moderate-size graph on characters in the novel
Les Misérables.
...
FIGURE 20-3: A Gephi hairball graph of the US power grid.
FIGURE 20-4: An interactive choropleth map in CARTO.
FIGURE 20-5: An interactive bubble map in CARTO.
FIGURE 20-6: A diagram from RAWCharts.
FIGURE 20-7: A bar chart in Infogram.
FIGURE 20-8: A bar chart in Piktochart.
Guide
Cover
Title Page
Copyright
Table of Contents
Begin Reading
Index
About the Author
Pages
i
ii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
37
38
39
40
41
42
43
44
45
46
47
48
49
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419