Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Cover image
Title page
Table of Contents
Copyright
Contributors
Acknowledgments
Introduction
Perspectives on data science for software engineering
Abstract
Why This Book?
About This Book
The Future
Software analytics and its application in practice
Abstract
Six Perspectives of Software Analytics
Experiences in Putting Software Analytics into Practice
Seven principles of inductive software engineering: What we do is different
Abstract
Different and Important
Principle #1: Humans Before Algorithms
Principle #2: Plan for Scale
Principle #3: Get Early Feedback
Principle #4: Be Open Minded
Principle #5: Be smart with your learning
Principle #6: Live With the Data You Have
Principle #7: Develop a Broad Skill Set That Uses a Big Toolkit
The need for data analysis patterns (in software engineering)
Abstract
The Remedy Metaphor
Software Engineering Data
Needs of Data Analysis Patterns
Building Remedies for Data Analysis in Software Engineering Research
From software data to software theory: The path less traveled
Abstract
Pathways of Software Repository Research
From Observation, to Theory, to Practice
Why theory matters
Abstract
Introduction
How to Use Theory
How to Build Theory
In Summary: Find a Theory or Build One Yourself
Success Stories/Applications
Mining apps for anomalies
Abstract
The Million-Dollar Question
App Mining
Detecting Abnormal Behavior
A Treasure Trove of Data …
… but Also Obstacles
Executive Summary
Embrace dynamic artifacts
Abstract
Acknowledgments
Can We Minimize the USB Driver Test Suite?
Still Not Convinced? Here’s More
Dynamic Artifacts Are Here to Stay
Mobile app store analytics
Abstract
Introduction
Understanding End Users
Conclusion
The naturalness of software
Abstract
Introduction
Transforming Software Practice
Conclusion
Advances in release readiness
Abstract
Predictive Test Metrics
Universal Release Criteria Model
Best Estimation Technique
Resource/Schedule/Content Model
Using Models in Release Management
Research to Implementation: A Difficult (but Rewarding) Journey
How to tame your online services
Abstract
Background
Service Analysis Studio
Success Story
Measuring individual productivity
Abstract
No Single and Simple Best Metric for Success/Productivity
Measure the Process, Not Just the Outcome
Allow for Measures to Evolve
Goodhart’s Law and the Effect of Measuring
How to Measure Individual Productivity?
Stack traces reveal attack surfaces
Abstract
Another Use of Stack Traces?
Attack Surface Approximation
Visual analytics for software engineering data
Abstract
Gameplay data plays nicer when divided into cohorts
Abstract
Cohort Analysis as a Tool for Gameplay Data
Play to Lose
Forming Cohorts
Case Studies of Gameplay Data
Challenges of Using Cohorts
Summary
A success story in applying data science in practice
Abstract
Overview
Analytics Process
Communication Process—Best Practices
Summary
There's never enough time to do all the testing you want
Abstract
The Impact of Short Release Cycles (There's Not Enough Time)
Learn From Your Test Execution History
The Art of Testing Less
Tests Evolve Over Time
In Summary
The perils of energy mining: measure a bunch, compare just once
Abstract
A Tale of Two HTTPs
Let's ENERGISE Your Software Energy Experiments
Summary
Identifying fault-prone files in large industrial software systems
Abstract
Acknowledgment
A tailored suit: The big opportunity in personalizing issue tracking
Abstract
Many Choices, Nothing Great
The Need for Personalization
Developer Dashboards or “A Tailored Suit”
Room for Improvement
What counts is decisions, not numbers—Toward an analytics design sheet
Abstract
Decisions Everywhere
The Decision-Making Process
The Analytics Design Sheet
Example: App Store Release Analysis
A large ecosystem study to understand the effect of programming languages on code quality
Abstract
Comparing Languages
Study Design and Analysis
Results
Summary
Code reviews are not for finding defects—Even established tools need occasional evaluation
Abstract
Results
Effects
Conclusions
Techniques
Interviews
Abstract
Why Interview?
The Interview Guide
Selecting Interviewees
Recruitment
Collecting Background Data
Conducting the Interview
Post-Interview Discussion and Notes
Transcription
Analysis
Reporting
Now Go Interview!
Look for state transitions in temporal data
Abstract
Bikeshedding in Software Engineering
Summarizing Temporal Data
Recommendations
Card-sorting: From text to themes
Abstract
Preparation Phase
Execution Phase
Analysis Phase
Tools! Tools! We need tools!
Abstract
Tools in Science
The Tools We Need
Recommendations for Tool Building
Evidence-based software engineering
Abstract
Introduction
The Aim and Methodology of EBSE
Contextualizing Evidence
Strength of Evidence
Evidence and Theory
Which machine learning method do you need?
Abstract
Learning Styles
Do additional Data Arrive Over Time?
Are Changes Likely to Happen Over Time?
If You Have a Prediction Problem, What Do You Really Need to Predict?
Do You Have a Prediction Problem Where Unlabeled Data are Abundant and Labeled Data are Expensive?
Are Your Data Imbalanced?
Do You Need to Use Data From Different Sources?
Do You Have Big Data?
Do You Have Little Data?
In Summary…
Structure your unstructured data first!: The case of summarizing unstructured data with tag clouds
Abstract
Unstructured Data in Software Engineering
Summarizing Unstructured Software Data
Conclusion
Parse that data! Practical tips for preparing your raw data for analysis
Abstract
Use Assertions Everywhere
Print Information About Broken Records
Use Sets or Counters to Store Occurrences of Categorical Variables
Restart Parsing in the Middle of the Data Set
Test on a Small Subset of Your Data
Redirect Stdout and Stderr to Log Files
Store Raw Data Alongside Cleaned Data
Finally, Write a Verifier Program to Check the Integrity of Your Cleaned Data
Natural language processing is no free lunch
Abstract
Natural Language Data in Software Projects
Natural Language Processing
How to Apply NLP to Software Projects
Summary
Aggregating empirical evidence for more trustworthy decisions
Abstract
What's Evidence?
What Does Data From Empirical Studies Look Like?
The Evidence-Based Paradigm and Systematic Reviews
How Far Can We Use the Outcomes From Systematic Review to Make Decisions?
If it is software engineering, it is (probably) a Bayesian factor
Abstract
Causing the Future With Bayesian Networks
The Need for a Hybrid Approach in Software Analytics
Use the Methodology, Not the Model
Becoming Goldilocks: Privacy and data sharing in “just right” conditions
Abstract
Acknowledgments
The “Data Drought”
Change is Good
Don’t Share Everything
Share Your Leaders
Summary
The wisdom of the crowds in predictive modeling for software engineering
Abstract
The Wisdom of the Crowds
So… How is That Related to Predictive Modeling for Software Engineering?
Examples of Ensembles and Factors Affecting Their Accuracy
Crowds for Transferring Knowledge and Dealing With Changes
Crowds for Multiple Goals
A Crowd of Insights
Ensembles as Versatile Tools
Combining quantitative and qualitative methods (when mining software data)
Abstract
Prologue: We Have Solid Empirical Evidence!
Correlation is Not Causation and, Even If We Can Claim Causation…
Collect Your Data: People and Artifacts
Build a Theory Upon Your Data
Conclusion: The Truth is Out There!
Suggested Readings
A process for surviving survey design and sailing through survey deployment
Abstract
Acknowledgments
The Lure of the Sirens: The Attraction of Surveys
Navigating the Open Seas: A Successful Survey Process in Software Engineering
In Summary
Wisdom
Log it all?
Abstract
A Parable: The Blind Woman and an Elephant
Misinterpreting Phenomenon in Software Engineering
Using Data to Expand Perspectives
Recommendations
Why provenance matters
Abstract
What’s Provenance?
What are the Key Entities?
What are the Key Tasks?
Another Example
Looking Ahead
Open from the beginning
Abstract
Alitheia Core
GHTorrent
Why the Difference?
Be Open or Be Irrelevant
Reducing time to insight
Abstract
What is Insight Anyway?
Time to Insight
The Insight Value Chain
What To Do
A Warning on Waste
Five steps for success: How to deploy data science in your organizations
Abstract
Step 1. Choose the Right Questions for the Right Team
Step 2. Work Closely With Your Consumers
Step 3. Validate and Calibrate Your Data
Step 4. Speak Plainly to Give Results Business Value
Step 5. Go the Last Mile—Operationalizing Predictive Models
How the release process impacts your software analytics
Abstract
Linking Defect Reports and Code Changes to a Release
How the Version Control System Can Help
Security cannot be measured
Abstract
Gotcha #1: Security is Negatively Defined
Gotcha #2: Having Vulnerabilities is Actually Normal
Gotcha #3: “More Vulnerabilities” Does not Always Mean “Less Secure”
Gotcha #4: Design Flaws are not Usually Tracked
Gotcha #5: Hackers are Innovative Too
An Unfair Question
Gotchas from mining bug reports
Abstract
Do Bug Reports Describe Code Defects?
It's the User That Defines the Work Item Type
Do Developers Apply Atomic Changes?
In Summary
Make visualization part of your analysis process
Abstract
Leveraging Visualizations: An Example With Software Repository Histories
How to Jump the Pitfalls
Don't forget the developers! (and be careful with your assumptions)
Abstract
Acknowledgments
Disclaimer
Background
Are We Actually Helping Developers?
Some Observations and Recommendations
Limitations and context of research
Abstract
Small Research Projects
Data Quality of Open Source Repositories
Lack of Industrial Representatives at Conferences
Research From Industry
Summary
Actionable metrics are better metrics
Abstract
What Would You Say… I Should DO?
The Offenders
Actionable Heroes
Cyclomatic Complexity: An Interesting Case
Are Unactionable Metrics Useless?
Replicated results are more trustworthy
Abstract
The Replication Crisis
Reproducible Studies
Reliability and Validity in Studies
So What Should Researchers Do?
So What Should Practitioners Do?
Diversity in software engineering research
Abstract
Introduction
What Is Diversity and Representativeness?
What Can We Do About It?
Evaluation
Recommendations
Future Work
Once is not enough: Why we need replication
Abstract
Motivating Example and Tips
Exploring the Unknown
Types of Empirical Results
Do's and Don't's
Mere numbers aren't enough: A plea for visualization
Abstract
Numbers Are Good, but…
Case Studies on Visualization
What to Do
Don’t embarrass yourself: Beware of bias in your data
Abstract
Dewey Defeats Truman
Impact of Bias in Software Engineering
Identifying Bias
Assessing Impact
Which Features Should I Look At?
Operational data are missing, incorrect, and decontextualized
Abstract
Background
Examples
A Life of a Defect
What to Do?
Data science revolution in process improvement and assessment?
Abstract
Correlation is not causation (or, when not to scream “Eureka!”)
Abstract
What Not to Do
Example
Examples from Software Engineering
What to Do
In Summary: Wait and Reflect Before You Report
Software analytics for small software companies: More questions than answers
Abstract
The Reality for Small Software Companies
Small Software Companies Projects: Smaller and Shorter
Different Goals and Needs
What to Do About the Dearth of Data?
What to Do on a Tight Budget?
Software analytics under the lamp post (or what star trek teaches us about the importance of asking the right questions)
Abstract
Prologue
Learning from Data
Which Bin is Mine?
Epilogue
What can go wrong in software engineering experiments?
Abstract
Operationalize Constructs
Evaluate Different Design Alternatives
Match Data Analysis and Experimental Design
Do Not Rely on Statistical Significance Alone
Do a Power Analysis
Find Explanations for Results
Follow Guidelines for Reporting Experiments
Improving the reliability of experimental results
One size does not fit all
Abstract
While models are good, simple explanations are better
Abstract
Acknowledgments
How Do We Compare a USB2 Driver to a USB3 Driver?
The Issue With Our Initial Approach
“Just Tell us What Is Different and Nothing More”
Looking Back
Users Prefer Simple Explanations
The white-shirt effect: Learning from failed expectations
Abstract
A Story
The Right Reaction
Practical Advice
Simpler questions can lead to better insights
Abstract
Introduction
Context of the Software Analytics Project
Providing Predictions on Buggy Changes
How to Read the Graph?
(Anti-)Patterns in the Error-Handling Graph
How to Act on (Anti-)Patterns?
Summary
Continuously experiment to assess values early on
Abstract
Most Ideas Fail to Show Value
Every Idea Can Be Tested With an Experiment
How Do We Find Good Hypotheses and Conduct the Right Experiments?
Key Takeaways
Lies, damned lies, and analytics: Why big data needs thick data
Abstract
How Great It Is, to Have Data Like You
Looking for Answers in All the Wrong Places
Beware the Reality Distortion Field
Build It and They Will Come, but Should We?
To Classify Is Human, but Analytics Relies on Algorithms
Lean in: How Ethnography Can Improve Software Analytics and Vice Versa
Finding the Ethnographer Within
The world is your test suite
Abstract
Watch the World and Learn
Crashes, Hangs, and Bluescreens
The Need for Speed
Protecting Data and Identity
Discovering Confusion and Missing Requirements
Monitoring Is Mandatory
← Prev
Back
Next →
← Prev
Back
Next →