Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Cover
Title page
Table of Contents
Copyright Page
Preface
Part 1: Principles
Chapter 1: Introducing Guerrilla Analytics
Summary
1.1. What is data analytics?
1.2. Types of data analytics projects
1.3. Introducing Guerrilla Analytics projects
1.4. Guerrilla Analytics definition
1.5. Example Guerrilla Analytics projects
1.6. Some terminology
1.7. Wrap up
Chapter 2: Guerrilla Analytics: Challenges and Risks
Summary
2.1. The Guerrilla Analytics workflow
2.2. Challenges of managing analytics projects
2.3. Risks
2.4. Impact of failure to address analytics risks
2.5. Wrap up
Chapter 3: Guerrilla Analytics Principles
Summary
3.1. Maintain data provenance despite disruptions
3.2. The principles
3.3. Applying the principles
3.4. Wrap up
Part 2: Practice
Chapter 4: Stage 1: Data Extraction
Summary
4.1. Guerrilla Analytics workflow
4.2. Pitfalls and risks
4.3. Practice tip 1: freeze the source system during data extraction
4.4. Practice tip 2: extract data into an agreed file format
4.5. Practice tip 3: calculate checksums before data extraction
4.6. Practice tip 4: capture front-end reports
4.7. Practice tip 5: save raw copies of web pages
4.8. Practice tip 6: consistency check OCR data
4.9. Wrap up
Chapter 5: Stage 2: Data Receipt
Summary
5.1. Guerrilla Analytics workflow
5.2. Pitfalls and risks
5.3. Practice tip 7: have a single location for all data received
5.4. Practice tip 8: create unique identifiers for received data
5.5. Practice tip 9: store data tracking information in a data log
5.6. Practice tip 10: never modify raw data files
5.7. Practice tip 11: keep supporting material near the data
5.8. Practice tip 12: version-control data received
5.9. Bringing it all together
5.10. Wrap up
Chapter 6: Stage 3: Data Load
Summary
6.1. Guerrilla Analytics Workflow
6.2. Pitfalls and risks
6.3. Practice tip 13: minimize modifications to data before load
6.4. Practice tip 14: do data load preparations on a copy of raw data files
6.5. Practice tip 15: add identifiers to raw data before loading
6.6. Practice tip 16: prefer one-to-one Data Loads
6.7. Practice tip 17: preserve the raw file name and data UID
6.8. Practice tip 18: load data as plain text
6.9. Common challenges
6.10. Wrap up
Chapter 7: Stage 4: Analytics Coding for Ease of Review
Summary
7.1. Guerrilla Analytics workflow
7.2. Pitfalls and risks
7.3. Practice tip 19: use one code file per data output
7.4. Practice tip 20: produce clearly identifiable data outputs
7.5. Practice tip 21: write code that runs from start to finish
7.6. Practice tip 22: favor code that is not embedded in proprietary file formats
7.7. Practice tip 23: clearly label the running order of code files
7.8. Practice tip 24: drop all datasets at the start of code execution
7.9. Practice tip 25: break up data flows into “data steps”
7.10. Practice tip 26: don’t jump in and out of a code file
7.11. Practice tip 27: log code execution
7.12. Common Challenges
7.13. Wrap up
Chapter 8: Stage 4: Analytics Coding to Maintain Data Provenance
Summary
8.1. Guerrilla Analytics workflow
8.2. Examples
8.3. Pitfalls and risks
8.4. Practice tip 28: clean data at a minimum of locations in a data flow
8.5. Practice tip 29: when cleaning a data field, keep the original raw field
8.6. Practice tip 30: filter data with flags, not deletions
8.7. Practice tip 31: identify fields with metadata
8.8. Practice tip 32: create a unique identifier for DATA records
8.9. Practice tip 33: rename data fields with a field mapping
8.10. Wrap up
Chapter 9: Stage 6: Creating Work Products
Summary
9.1. Guerrilla Analytics workflow
9.2. Examples
9.3. The essence of a work product
9.4. Pitfalls and risks
9.5. Practice tip 34: track work products with a Unique Identifier (UID)
9.6. Practice tip 35: keep work product generators and outputs close together
9.7. Practice tip 36: avoid clutter in the file system
9.8. Practice tip 37: avoid clutter in the DME
9.9. Practice tip 38: give output data records a UID
9.10. Practice tip 39: version control work products
9.11. Practice tip 40: use a convention to name complex outputs
9.12. Practice tip 41: log all Work Products
9.13. Wrap up
Chapter 10: Stage 7: Reporting
Summary
10.1. Guerrilla Analytics workflow
10.2. What is a report?
10.3. Why reports are complicated
10.4. Report components
10.5. Pitfalls and risks
10.6. Practice tip 42: liaise with report writers
10.7. Practice tip 43: create one work product per report component
10.8. Practice tip 44: make presentation quality work products
10.9. Extreme reporting
10.10. Wrap up
Chapter 11: Stage 5: Consolidating Knowledge in Builds
Summary
11.1. Introduction
11.2. Pitfalls and risks
11.3. Example: the customer address problem
11.4. Sources of variation
11.5. Definition of a build
11.6. The customer address example using a Build
11.7. Data Builds
11.8. Service Builds
11.9. When to start a build
11.10. Wrap up
Part 3: Testing
Chapter 12: Introduction to Testing
Summary
12.1. Guerrilla Analytics workflow
12.2. What is testing?
12.3. Why do testing?
12.4. Areas of testing
12.5. Comparing expected and actual
12.6. The challenge of testing Guerrilla Analytics
12.7. Practice Tip 61: establish a testing culture
12.8. Practice Tip 62: test early
12.9. Practice Tip 63: test often
12.10. Practice Tip 64: give tests unique identifiers
12.11. Practice Tip 65: organize test data by test UID
12.12. Next chapters on testing
12.13. Wrap up
Chapter 13: Testing Data
Summary
13.1. Guerrilla Analytics workflow
13.2. The five C’s of testing data
13.3. Testing data completeness
13.4. Testing data correctness
13.5. Testing consistency
13.6. Testing data coherence
13.7. Testing accountability
13.8. Implementing data testing
13.9. Wrap up
Chapter 14: Testing Builds
Summary
14.1. Structure of a data build
14.2. An illustrative example
14.3. Types of build tests
14.4. Test code development
14.5. Organizing build test code
14.6. Organizing test data
14.7. Wrap up
Chapter 15: Testing Work Products
Summary
15.1. Types of testable work products
15.2. Ordinary work products
15.3. General tips on testing ordinary work products
15.4. Testing statistical models
15.5. General tips on testing models
15.6. Wrap up
Part 4: Building Guerrilla Analytics Capability
Introduction
Chapter 16: People
Summary
16.1. That question again – what is data analytics?
16.2. Guerrilla Analytics skills
16.3. Programming
16.4. Substantive expertise
16.5. Communication
16.6. “Maths and stats”
16.7. Visualization
16.8. Software engineering
16.9. Mindset
16.10. Wrap up
Chapter 17: Process
Summary
17.1. What is workflow management?
17.2. Workflows in Analytics
17.3. Levels of review
17.4. Linking work products
17.5. Classifying work products
17.6. Granularity
17.7. When to use workflow management
17.8. Wrap up
Chapter 18: Technology
Summary
18.1. Analytics capabilities
18.2. Data manipulation environment
18.3. Source code control
18.4. Access to the command line
18.5. High-level scripting language
18.6. Visualization
18.7. Build tool
18.8. Access to the internet
18.9. Encryption
18.10. Code libraries for data wrangling
18.11. Machine learning and statistics libraries
18.12. Centralized and controlled file system
18.13. Additional technology capabilities
18.14. Wrap up
Chapter 19: Closing Remarks
19.1. What was this book about?
19.2. Next steps for Guerrilla Analytics
19.3. Keep in touch
Acknowledgments
Appendix: Data Gymnastics
References
Index
← Prev
Back
Next →
← Prev
Back
Next →