Human-In-The-Loop Machine Learning MEAP V08 by Munro, Robert -- Read -- Imperial Library of Trantor

Index

1.1 The Basic Principles of Human-in-the-Loop Machine Learning 1.2 Introducing Annotation

1.2.1 Simple and more complicated annotation strategies 1.2.2 Plugging the gap in data science knowledge 1.2.3 Quality human annotations: why is it hard?

1.3 Introducing Active Learning: improving the speed and reducing the cost of training data

1.3.1 Three broad Active Learning sampling strategies: uncertainty, diversity, and random 1.3.2 What is a random selection of evaluation data? 1.3.3 When to use Active Learning?

1.4 Machine Learning and Human-Computer Interaction

1.4.1 User interfaces: how do you create training data? 1.4.2 Priming: what can influence human perception? 1.4.3 The pros and cons of creating labels by evaluating Machine Learning predictions 1.4.4 Basic principles for designing annotation interfaces

1.5 Machine Learning-Assisted Humans vs Human-Assisted Machine Learning 1.6 Transfer learning to kick-start your models

1.6.1 Transfer Learning in Computer Vision 1.6.2 Transfer Learning in Natural Language Processing

1.7 What to expect in this text 1.8 Summary

2 Getting Started with Human-in-the-Loop Machine Learning

2.1 Beyond “Hack-tive Learning:” your first Active Learning algorithm

2.1.1 The architecture of your first HuML system

2.2 Interpreting model predictions and data to support Active Learning

2.2.1 Confidence ranking 2.2.2 Identifying outliers 2.2.3 What to expect as you iterate

2.3 2.3 Building an interface to get human labels

2.3.1 A simple interface for labeling text 2.3.2 Managing Machine Learning data

2.4 Deploying your first Human-in-the-Loop Machine Learning system

2.4.1 Always get your evaluation data first! 2.4.2 Every data point gets a chance 2.4.3 Select the right strategies for your data 2.4.4 Retrain the model and iterate

2.5 Summary

3 Uncertainty Sampling

3.1 Interpreting Uncertainty in a Machine Learning Model

3.1.1 Why look for uncertainty in your model? 3.1.2 Interpreting the scores from your model 3.1.3 “Score”, “Confidence”, and “Probability”: Do not trust the name! 3.1.4 SoftMax: converting the model output into confidences

3.2 Algorithms for Uncertainty Sampling

3.2.1 Least Confidence sampling 3.2.2 Margin of Confidence sampling 3.2.3 Ratio of Confidence sampling 3.2.4 Entropy (classification entropy) 3.2.5 A Deep Dive on Entropy

3.3 Identifying when different types of models are confused

3.3.1 What is the best activation function for Active Learning? 3.3.2 Uncertainty sampling with Logistic Regression and MaxEnt models 3.3.3 Uncertainty sampling with Support Vector Machines 3.3.4 Uncertainty sampling with Bayesian Models 3.3.5 Uncertainty sampling with Decision Trees & Random Forests

3.4 Measuring uncertainty across multiple models

3.4.1 Uncertainty sampling with Ensemble models 3.4.2 Query by Committee and Dropouts

3.5 Selecting the right number of items for human-review

3.5.1 Budget-constrained uncertainty sampling 3.5.2 Time-constrained uncertainty sampling 3.5.3 When do I stop if I’m not time or budget constrained?

3.6 Evaluating the success of uncertainty sampling

3.6.1 Do I need new test data? 3.6.2 Do I need new validation data?

3.7 Uncertainty Sampling cheatsheet and further reading

3.7.1 Further Reading for Least Confidence Sampling 3.7.2 Further Reading for Margin of Confidence Sampling 3.7.3 Further Reading for Ratio of Confidence Sampling 3.7.4 Further Reading for Entropy-based Sampling 3.7.5 Further Reading for Uncertainty Sampling for Other Machine Learning models 3.7.6 Further Reading for Ensemble-based Uncertainty Sampling

3.8 Summary

4 Diversity Sampling

4.1 Knowing what you don’t know: identifying where your model is blind

4.1.1 Example data for Diversity Sampling 4.1.2 Interpreting neural models for Diversity Sampling 4.1.3 Getting information from hidden layers in PyTorch

4.2 Model-based outlier sampling

4.2.1 Use validation data to rank activations 4.2.2 Which layers should I use to calculate model-based outliers? 4.2.3 The limitations of model-based outliers

4.3 Cluster-based sampling

4.3.1 Cluster members, centroids and outliers 4.3.2 Any clustering algorithm in the universe 4.3.3 K-Means clustering with cosine similarity 4.3.4 Reduced feature dimensions via embeddings or PCA 4.3.5 Other clustering algorithms

4.4 Representative Sampling

4.4.1 Representative Sampling is rarely used in isolation 4.4.2 Simple Representative Sampling 4.4.3 Adaptive Representative Sampling

4.5 Sampling for real-world diversity

4.5.1 Common problems in training data diversity 4.5.2 Stratified sampling to ensure diversity of demographics 4.5.3 Represented and Representative: which matters? 4.5.4 Limitations of sampling for real-world diversity

4.6 Diversity Sampling with different types of models

4.6.1 Model-based outliers with different types of models 4.6.2 Clustering with different types of models 4.6.3 Representative Sampling with different types of models 4.6.4 Sampling for real-world diversity with different types of models

4.7 Evaluating models for Diversity Sampling

4.7.1 Micro and Macro Precision, Recall, and F-Score 4.7.2 Per-demographic accuracy 4.7.3 Macro and Micro AUC

4.8 Deciding on the right number of items for human review

4.8.1 Adaptive Cluster-based Sampling 4.8.2 Adaptive Representative Sampling 4.8.3 Deciding on the right ratio of samples from Diversity Sampling and Uncertainty Sampling

4.9 Diversity Sampling Cheatsheet and Further Reading

4.9.1 Further Reading for Model-Based Outliers 4.9.2 Further Reading for Cluster-based Sampling 4.9.3 Further Reading for Representative Sampling 4.9.4 Further Reading for Sampling for Real-World Diversity

4.10 Summary

5 Advanced Active Learning

5.1 Combining Uncertainty Sampling and Diversity Sampling

5.1.1 Least Confidence Sampling with Clustering-based Sampling 5.1.2 Uncertainty Sampling with Model-based Outliers 5.1.3 Uncertainty Sampling with Model-based Outliers and Clustering 5.1.4 Representative Sampling Cluster-based Sampling 5.1.5 Sampling from the Highest Entropy Cluster 5.1.6 Other Combinations of Active Learning Strategies 5.1.7 Combining Active Learning Scores 5.1.8 Expected Error Reduction Sampling

5.2 Active Transfer Learning for Uncertainty Sampling

5.2.1 Making your model predict its own errors 5.2.2 Implementing Active Transfer Learning 5.2.3 Active Transfer Learning with more layers 5.2.4 The pros and cons of Active Transfer Learning

5.3 Applying Active Transfer Learning to Representative Sampling

5.3.1 Making your model predict what it doesn’t know 5.3.2 Active Transfer Learning for Adaptive Representative Sampling 5.3.3 The pros and cons of Active Transfer Learning for Representative Sampling

5.4 Active Transfer Learning for Adaptive Sampling (ATLAS)

5.4.1 How to make Uncertainty Sampling adaptive by predicting the uncertainty 5.4.2 The pros and cons of ATLAS

5.5 Advanced Active Learning Cheatsheets and Further Reading

5.5.1 Further Reading for Advanced Active Learning 5.5.2 Further Reading for Active Transfer Learning

5.6 Summary

6 Applying Active Learning to Different Machine Learning Tasks

6.1 Applying Active Learning to object detection

6.1.1 Accuracy for object detection: Label Confidence and Localization 6.1.2 Uncertainty Sampling for Label Confidence and Localization in object detection 6.1.3 Diversity Sampling for Label Confidence and Localization in object detection 6.1.4 Active Transfer Learning for Object Detection 6.1.5 Set a low object detection threshold to avoid perpetuating bias 6.1.6 Create training data samples for Representative Sampling that are similar to your predictions 6.1.7 Sample randomly and consider some image-level sampling 6.1.8 Consider tighter masks when using polygons

6.2 Applying Active Learning to semantic segmentation

6.2.1 Accuracy for semantic segmentation 6.2.2 Uncertainty Sampling for semantic segmentation 6.2.3 Diversity Sampling for Semantic Segmentation 6.2.4 Active Transfer Learning for Semantic Segmentation 6.2.5 Sample randomly and consider some image-level sampling

6.3 Applying Active Learning to Sequence Labeling

6.3.1 Accuracy for Sequence Labeling 6.3.2 Uncertainty Sampling for Sequence Labeling 6.3.3 Diversity Sampling for Sequence Labeling 6.3.4 Active Transfer Learning for Sequence Labelling 6.3.5 Set a low prediction threshold to avoid perpetuating bias 6.3.6 Create training data samples for Representative Sampling that are similar to your predictions 6.3.7 Full-sequence labeling 6.3.8 Sample randomly and consider some document-level sampling

6.4 Applying Active Learning to Text Generation

6.4.1 Calculating accuracy for text generation systems 6.4.2 Uncertainty Sampling for Text Generation 6.4.3 Diversity Sampling for Text Generation 6.4.4 Active Transfer Learning for Text Generation

6.5 Applying Active Learning to other Machine Learning tasks

6.5.1 Active Learning for Information Retrieval 6.5.2 Active Learning for Video 6.5.3 Active Learning for Speech

6.6 Choosing the right number of items for human review

6.6.1 Active Labeling for fully or partially annotated data 6.6.2 Combining Machine Learning with Annotation

6.7 Summary

7 Selecting the Right People to Annotate your Data

7.1 Introduction to data labeling

7.1.1 Three principles for good data annotation 7.1.2 Annotating data and reviewing Machine Learning predictions 7.1.3 Annotations from Machine Learning assisted humans

7.2 In-house experts

7.2.1 Salary for in-house workers 7.2.2 Security for in-house workers 7.2.3 Ownership for in-house workers 7.2.4 Tip: Always run in-house annotation sessions

7.3 Outsourced workers

7.3.1 Salary for outsourced workers 7.3.2 Security for outsourced workers 7.3.3 Ownership for outsourced workers 7.3.4 Tip: Talk to your outsourced workers

7.4 Crowdsourced workers

7.4.1 Salary for crowdsourced workers 7.4.2 Security for crowdsourced workers 7.4.3 Ownership for crowdsourced workers 7.4.4 Tip: Create a path to secure work and career advancement

7.5 Other workforces

7.5.1 End-users 7.5.2 Volunteers 7.5.3 People playing games 7.5.4 Computer-generated annotations

7.6 Estimating the volume of annotation needed

7.6.1 The orders-of-magnitude equation for number of annotations needed 7.6.2 Anticipate 1-4 weeks of annotation training and task refinement 7.6.3 Use your pilot annotations and accuracy goal to estimate cost 7.6.4 Combining types of work-forces

7.7 Summary

8 Quality Control for Data Annotation

8.1 Comparing annotations to ground-truth answers

8.1.1 Annotator agreement with ground-truth data 8.1.2 Which baseline should you use for expected accuracy?

8.2 Inter-annotator agreement

8.2.1 Introduction to inter-annotator agreement 8.2.2 Dataset-level agreement with Krippendorff's alpha 8.2.3 Individual-annotator agreement 8.2.4 Per-label and per-demographic agreement

8.3 Aggregating multiple annotations to create training data

8.3.1 Aggregating annotations when everyone agrees 8.3.2 The mathematical case for diverse annotators and low agreement 8.3.3 Aggregating annotations when annotators disagree 8.3.4 Annotator-reported confidences 8.3.5 Deciding which labels to trust: annotation uncertainty

8.4 Quality control by expert review

8.4.1 Recruiting and training qualified people 8.4.2 Training people to become experts 8.4.3 Machine Learning-assisted experts

8.5 Multistep workflows and review tasks 8.6 Further Reading 8.7 Summary

9 Advanced Data Annotation and Augmentation

9.1 Annotation Quality for Subjective Tasks

9.1.1 Requesting annotator expectations 9.1.2 Assessing viable labels for subjective tasks 9.1.3 Trusting an annotator to understand the diversity of possible responses 9.1.4 Bayesian Truth Serum for subjective judgments 9.1.5 Embedding simple tasks in more complicated ones

9.2 Machine Learning for annotation quality control

9.2.1 Calculating annotation confidence as an optimization task 9.2.2 Converging on label confidence when annotators disagree 9.2.3 Predicting whether a single annotation is correct or incorrect

9.3 Model predictions as annotations

9.3.1 Trusting annotations from confident model predictions 9.3.2 Treating model predictions as a single annotator 9.3.3 Cross-validating to find mislabeled data

9.4 Embeddings/Contextual Representations

9.4.1 Transfer learning from an existing model 9.4.2 Representations from adjacent easy-to-annotate tasks 9.4.3 Using inherent labels in the data

9.5 Search-based and Rule-based systems

9.5.1 Data-filtering with rules 9.5.2 Training data search

9.6 Light supervision on unsupervised models

9.6.1 Adapting an unsupervised model to a supervised model 9.6.2 Human-guided exploratory data analysis

9.7 Synthetic data, data creation and data augmentation

9.7.1 Synthetic Data 9.7.2 Data creation 9.7.3 Data augmentation

9.8 Incorporating annotation information into machine learning models

9.8.1 Filter or weight items by the confidence in their labels 9.8.2 Include the annotator identity in inputs 9.8.3 Incorporate uncertainty into the loss function

9.9 Further Reading for Advanced Annotation

9.9.1 Further Reading for Subjective Data 9.9.2 Further Reading for Machine Learning for Annotation Quality Control 9.9.3 Further Reading for Embeddings/Contextual Representations 9.9.4 Further Reading for Rule-based Systems 9.9.5 Further Reading for incorporating uncertainty in annotations into the downstream models

9.10 Summary

10 Annotation Quality for Different Machine Learning Tasks

10.1 Annotation Quality for Continuous Tasks

10.1.1 Ground-truth for Continuous Tasks 10.1.2 Agreement for Continuous Tasks 10.1.3 Subjectivity in continuous tasks 10.1.4 Aggregating continuous judgements to create training data 10.1.5 Machine learning for aggregating continuous tasks to create training data

10.2 Annotation Quality for Object Detection

10.2.1 Ground-truth for Object Detection 10.2.2 Agreement for Object Detection 10.2.3 Dimensionality and Accuracy in Object Detection 10.2.4 Subjectivity for Object Detection 10.2.5 Aggregating object annotations to create training data 10.2.6 Machine learning for object annotations

10.3 Annotation Quality for Semantic Segmentation

10.3.1 Ground-truth for semantic segmentation annotation 10.3.2 Agreement for Semantic Segmentation 10.3.3 Subjectivity for Semantic Segmentation annotations 10.3.4 Aggregating Semantic Segmentation to create training data 10.3.5 Machine learning for aggregating semantic segmentation tasks to create training data

10.4 Annotation Quality for Sequence Labeling

10.4.1 Ground-truth for Sequence Labeling 10.4.2 Ground-truth for Sequence Labeling in Truly Continuous Data 10.4.3 Agreement for Sequence Labeling 10.4.4 Machine learning and transfer learning for sequence labeling 10.4.5 Rule-based, Search-based, and Synthetic data for sequence labeling

10.5 Annotation Quality for Language Generation

10.5.1 Ground-truth for Language Generation 10.5.2 Agreement and Aggregation for Sequence Generation 10.5.3 Machine learning and transfer learning for language generation 10.5.4 Synthetic data for language generation

10.6 Annotation Quality for Other Machine Learning Tasks

10.6.1 Annotation for Information Retrieval 10.6.2 Annotation for multi-field tasks 10.6.3 Annotation for Video 10.6.4 Annotation for Audio data

10.7 Further Reading for Annotation Quality for different machine learning tasks

10.7.1 Further reading for computer vision 10.7.2 Further reading for annotation for Natural Language Processing 10.7.3 Further reading for annotation for Information Retrieval

10.8 Summary

← Prev
Back
Next →

← Prev
Back
Next →