Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Natural Language Annotation for Machine Learning Preface
Natural Language Annotation for Machine Learning Audience Organization of This Book Software Requirements Conventions Used in This Book Using Code Examples Safari® Books Online How to Contact Us Acknowledgments
James Adds: Amber Adds:
1. The Basics
The Importance of Language Annotation
The Layers of Linguistic Description What Is Natural Language Processing?
A Brief History of Corpus Linguistics
What Is a Corpus? Early Use of Corpora Corpora Today Kinds of Annotation
Language Data and Machine Learning
Classification Clustering Structured Pattern Induction
The Annotation Development Cycle
Model the Phenomenon Annotate with the Specification Train and Test the Algorithms over the Corpus Evaluate the Results Revise the Model and Algorithms
Summary
2. Defining Your Goal and Dataset
Defining Your Goal
The Statement of Purpose Refining Your Goal: Informativity Versus Correctness
The scope of the annotation task What will the annotation be used for? What will the overall outcome be? Where will the corpus come from? How will the result be achieved?
Background Research
Language Resources Organizations and Conferences NLP Challenges
Assembling Your Dataset
The Ideal Corpus: Representative and Balanced Collecting Data from the Internet Eliciting Data from People
Read speech Spontaneous speech
The Size of Your Corpus
Existing Corpora Distributions Within Corpora
Summary
3. Corpus Analytics
Basic Probability for Corpus Analytics
Joint Probability Distributions Bayes Rule
Counting Occurrences
Zipf’s Law N-grams
Language Models Summary
4. Building Your Model and Specification
Some Example Models and Specs
Film Genre Classification Adding Named Entities Semantic Roles
Adopting (or Not Adopting) Existing Models
Creating Your Own Model and Specification: Generality Versus Specificity Using Existing Models and Specifications Using Models Without Specifications
Different Kinds of Standards
ISO Standards
Annotation format standards Annotation specification standards
Community-Driven Standards Other Standards Affecting Annotation
Summary
5. Applying and Adopting Annotation Standards
Metadata Annotation: Document Classification
Unique Labels: Movie Reviews Multiple Labels: Film Genres
Text Extent Annotation: Named Entities
Inline Annotation Stand-off Annotation by Tokens Stand-off Annotation by Character Location
Linked Extent Annotation: Semantic Roles ISO Standards and You Summary
6. Annotation and Adjudication
The Infrastructure of an Annotation Project Specification Versus Guidelines Be Prepared to Revise Preparing Your Data for Annotation
Metadata Preprocessed Data Splitting Up the Files for Annotation
Writing the Annotation Guidelines
Example 1: Single Labels—Movie Reviews Example 2: Multiple Labels—Film Genres Example 3: Extent Annotations—Named Entities Example 4: Link Tags—Semantic Roles
Annotators Choosing an Annotation Environment Evaluating the Annotations
Cohen’s Kappa (κ) Fleiss’s Kappa (κ) Interpreting Kappa Coefficients Calculating κ in Other Contexts
Creating the Gold Standard (Adjudication) Summary
7. Training: Machine Learning
What Is Learning? Defining Our Learning Task Classifier Algorithms
Decision Tree Learning Gender Identification Naïve Bayes Learning
Movie genre identification Sentiment classification
Maximum Entropy Classifiers Other Classifiers to Know About
Sequence Induction Algorithms Clustering and Unsupervised Learning Semi-Supervised Learning Matching Annotation to Algorithms Summary
8. Testing and Evaluation
Testing Your Algorithm Evaluating Your Algorithm
Confusion Matrices Calculating Evaluation Scores
Percentage accuracy Precision and recall F-measure Other evaluation metrics
Interpreting Evaluation Scores
Problems That Can Affect Evaluation
Dataset Is Too Small Algorithm Fits the Development Data Too Well Too Much Information in the Annotation
Final Testing Scores Summary
9. Revising and Reporting
Revising Your Project
Corpus Distributions and Content Model and Specification Annotation
Guidelines Annotators Tools
Training and Testing
Reporting About Your Work
About Your Corpus About Your Model and Specifications About Your Annotation Task and Annotators About Your ML Algorithm About Your Revisions
Summary
10. Annotation: TimeML
The Goal of TimeML Related Research Building the Corpus Model: Preliminary Specifications
Times Signals Events Links
Annotation: First Attempts Model: The TimeML Specification Used in TimeBank
Time Expressions Events Signals Links Confidence
Annotation: The Creation of TimeBank TimeML Becomes ISO-TimeML Modeling the Future: Directions for TimeML
Narrative Containers Expanding TimeML to Other Domains Event Structures
Summary
11. Automatic Annotation: Generating TimeML
The TARSQI Components
GUTime: Temporal Marker Identification EVITA: Event Recognition and Classification GUTenLINK Slinket SputLink Machine Learning in the TARSQI Components
Improvements to the TTK
Structural Changes Improvements to Temporal Entity Recognition: BTime Temporal Relation Identification Temporal Relation Validation Temporal Relation Visualization
TimeML Challenges: TempEval-2
TempEval-2: System Summaries Overview of Results
Future of the TTK
New Input Formats Narrative Containers/Narrative Times Medical Documents Cross-Document Analysis
Summary
12. Afterword: The Future of Annotation
Crowdsourcing Annotation
Amazon’s Mechanical Turk Games with a Purpose (GWAP) User-Generated Content
Handling Big Data
Boosting Active Learning Semi-Supervised Learning
NLP Online and in the Cloud
Distributed Computing Shared Language Resources Shared Language Applications
And Finally...
A. List of Available Corpora and Specifications
Corpora Specifications, Guidelines, and Other Resources Representation Standards
B. List of Software Resources
Annotation and Adjudication Software
Multipurpose Tools Corpus Creation and Exploration Tools Manual Annotation Tools Automated Annotation Tools
Multipurpose tools Phonetic annotation Part-of-speech taggers/syntactic parsers Tokenizers/chunkers/stemmers Other
Machine Learning Resources
C. MAE User Guide
Installing and Running MAE Loading Tasks and Files
Loading a Task Loading a File Annotating Entities
Attribute information Nonconsuming tags
Annotating Links Deleting Tags
Saving Files Defining Your Own Task
Task Name Elements (a.k.a. Tags) Attributes
id attributes start attribute Attribute types Default attribute values
Frequently Asked Questions
D. MAI User Guide
Installing and Running MAI Loading Tasks and Files
Loading a Task Loading Files
Adjudicating
The MAI Window Adjudicating a Tag Extent Tags Link Tags Nonconsuming Tags Adding New Tags Deleting tags
Saving Files
E. Bibliography
References for Using Amazon’s Mechanical Turk/Crowdsourcing
Index About the Authors Colophon Copyright
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion