Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
About This Book
What Does This Book Cover?
Is This Book for You?
What Should You Know about the Examples?
Software Used to Develop the Book’s Content
Example Code and Data
We Want to Hear from You
Acknowledgments
Chapter 1: Fundamentals of Information Extraction with SAS
1.1. Introduction to Information Extraction
1.1.1. History
1.1.2. Evaluation
1.1.3. Information Extraction versus Data Extraction versus Information Retrieval
1.1.4. Situations in Which to Use IE for Business Problems
1.2. The SAS IE Toolkit
1.2.1. NLP Foundation for IE
1.2.2. LITI Rule Syntax
1.2.3. Predefined Concepts
1.2.4. Taxonomy of Concepts
1.2.5. Algorithms for Matching
1.2.6. Interfaces for Building and Applying Models
1.3. Reasons for Using SAS IE
1.4. When You Should Use Other Approaches instead of SAS IE
1.5. Important Terms in the Book
1.5.1. Strings versus Tokens
1.5.2. Named Entities and Predefined Concepts
1.5.3. Parent Forms and Other Variants
1.5.4. Found Text and Extracted Match
1.6. Suggested Reading
Chapter 2: Fundamentals of Named Entities
2.1. Introduction to Named Entities
2.2. Business Scenarios
2.2.1. Example: Pinpointing Location Information
2.2.2. Example: Identifying Supporters and Competitors
2.2.3. Example: Estimating Loss, Gain, and Risk
2.2.4. Example: Detecting Personally Identifiable Information
2.3. The SAS Approach
2.3.1. Understanding Standard Predefined Concepts
2.3.2. Understanding Underlying Principles
2.3.3. Accessing the Predefined Concepts
Chapter 3: SAS Predefined Concepts: Enamex
3.1. Introduction to SAS Predefined Concepts
3.2. Person
3.2.1. Titles in Person Names
3.2.2. Suffixes as Part of a Personal Name
3.2.3. Single-Word Names
3.2.4. Body References
3.2.5. Quotes
3.2.6. Locations as Part of Name
3.2.7. Groups of Individuals
3.2.8. Historical Figures, Saints, and Deities
3.2.9. Animals, Fictional Characters, Artificial Intelligence, and Aliens
3.2.10. Businesses Named after People
3.2.11. Laws, Diseases, Prizes, and Works of Art
3.3. Place
3.3.1. Common Nouns and Determiners
3.3.2. Subnational Regions and Other Descriptors
3.3.3. Street Addresses
3.3.4. Monuments
3.3.5. Celestial Bodies
3.3.6. Neighborhoods
3.3.7. Fictional Place Names
3.3.8. Conjoined Location Names
3.3.9. Special Cases for Nonmatches
3.4. Organization
3.4.1. Corporate Designators or Suffixes
3.4.2. Determiners before Proper Names
3.4.3. Facility Names Associated with an Organization
3.4.4. Groups of Individuals
3.4.5. Aliases
3.4.6. Conjoined Organization Names
3.4.7. Event Names
3.4.8. Special Cases for Nonmatches
3.5. Disambiguation of Matches
3.5.1. Organization or Place
3.5.2. Organization or Product
3.5.3. Organization or Person
Chapter 4: SAS Predefined Concepts: Timex, Numex, and Noun Group
4.1. Introduction to Other SAS Predefined Concepts
4.2. Date
4.2.1 Extended ISO 8601 Format
4.2.2. Named Dates
4.2.3. Modifiers
4.2.4. Conjoined Dates
4.2.5. Duration
4.2.6. Vague Expressions
4.3. Time
4.3.1. Extended ISO 8601 Format
4.3.2. Named Times and Time Zones
4.3.3. Modifiers
4.3.4. Conjoined Times
4.3.5. Duration
4.3.6. Vague Expressions
4.4. Money
4.4.1. Modifiers
4.4.2. Rates and Ratios
4.4.3. Quotes and Parentheses
4.4.4. Conjoined Expressions
4.4.5. Approximate Amount
4.4.6. Expressions and Metaphors
4.5. Percent
4.5.1. Acronyms, Initialisms, and Abbreviations
4.5.2. Modifiers
4.5.3. Quotation Marks and Parentheses
4.5.4. Conjoined Expressions
4.5.5. Multiword Expressions
4.5.6. Fractions and Ratios
4.5.7. Special Cases for Nonmatches
4.6. Noun Group
4.7. Disambiguation of Matches
4.8. Supplementing Predefined Concepts
Chapter 5: Fundamentals of Creating Custom Concepts
5.1. Introduction to Custom Concepts
5.2. LITI Rule Fundamentals
5.2.1. Required Parts of LITI Rules
5.2.2. Optional Parts of LITI Rules
5.2.3. Rule Definition
5.3. Custom Concept Fundamentals
5.3.1. Best Practices for Naming Custom Concepts
5.3.2. Best Practices for Referencing Custom Concepts
5.3.3. Concepts versus CONCEPT and CONCEPT_RULE Rule Types
5.3.4. Programmatic Rule Writing and Model Compilation
5.3.5. Programmatic Model Application
5.4. Troubleshooting All Rule Types
Chapter 6: Concept Rule Types
6.1. Introduction to the Concept Rule Types
6.2. CLASSIFIER Rule Type
6.2.1. Basic Use
6.2.2. Advanced Use: Coreference Command
6.2.3. Advanced Use: Information Field
6.2.4. Troubleshooting
6.2.5. Best Practices
6.2.6. Summary
6.3. CONCEPT Rule Type
6.3.1. Basic Use
6.3.2. Advanced Use: Combination of Various Elements
6.3.3. Advanced Use: Combination of Elements and Modifiers
6.3.4. Troubleshooting
6.3.5. Best Practices
6.3.6. Summary
6.4. C_CONCEPT Rule Type
6.4.1. Basic Use
6.4.2. Advanced Use: Multiple Strings as Matches
6.4.3. Advanced Use: Coreference
6.4.4. Troubleshooting
6.4.5. Best Practices
6.4.5. Summary
Chapter 7: CONCEPT_RULE Type
7.1. Introduction to the CONCEPT_RULE Type
7.2. Basic Use
7.3. Advanced Use: Multiple and Embedded Operators
7.4. Advanced Use: Negation Using NOT
7.5. Advanced Use: Negation Using UNLESS
7.6. Advanced Use: Coreference and Aliases
7.7. Troubleshooting
7.8. Best Practices
7.9. Summary
Chapter 8: Fact Rule Types
8.1. Introduction to Fact Rule Types
8.2. SEQUENCE Rule Type
8.2.1. Basic Use
8.2.2. Advanced Use with Other Elements
8.2.3. Troubleshooting
8.2.4. Best Practices
8.2.5. Summary
8.3. PREDICATE_RULE Rule Type
8.3.1. Basic Use
8.3.2. Advanced Use: Capture of a Sentence
8.3.3. Advanced Use: More Complex Rules
8.3.4. Advanced Use: Single Label, Multiple Extracted Matches
8.3.5. Advanced Use: More Than Two Returned Arguments
8.3.6. Advanced Use: Discovery of Terms to Add to a Model
8.3.7. Troubleshooting
8.3.8. Best Practices
8.3.9. Summary
Chapter 9: Filter Rule Types
9.1. Introduction to Filter Rule Types
9.2. REMOVE_ITEM Rule Type
9.2.1. Basic Use of the REMOVE_ITEM Rule Type
9.2.2. Advanced Use of REMOVE_ITEM: Additional Elements
9.2.3. Advanced Use of REMOVE_ITEM: Negation
9.2.3. REMOVE_ITEM Troubleshooting
9.2.4. REMOVE_ITEM Best Practices
9.2.5. REMOVE_ITEM Summary
9.3. NO_BREAK Rule Type
9.3.1. Basic Use of the NO_BREAK Rule Type
9.3.2. Advanced Use of NO_BREAK: Specifying a Concept Name
9.3.3. NO_BREAK Troubleshooting
9.3.4. NO_BREAK Best Practices
9.3.5. NO_BREAK Summary
Chapter 10: REGEX Rule Type
10.1. Introduction to the REGEX Rule Type
10.2. Basic Use
10.3. Advanced Use: Discovery of Patterns
10.4. Advanced Use: Exploration
10.5. Advanced Use: Identification of Tokens for Splitting in Post-processing
10.6. Advanced Use: Information Field
10.7. Troubleshooting REGEX
10.8. Best Practices for Using REGEX
10.9. Summary of REGEX
Chapter 11: Best Practices for Custom Concepts
11.1. Introduction to Boolean and Proximity Operators
11.2. Best Practices for Using Operators
11.2.1. Behavior of Groupings of Single Operators
11.2.2. SAS Categorization Operators
11.2.3. Combinations of Operators and Restrictions
11.3. Best Practices for Selecting Rule Types
11.3.1. Rule Types and Associated Computational Costs
11.3.2. Use of the Least Costly Rule Type for Best Performance
11.3.3. When Not to Use Certain Rule Types
11.3. Concept Rules in Models
Chapter 12: Fundamentals of Data Considerations
12.1. Introduction to Projects
12.2. Data Considerations
12.3. Data Evaluation
12.4. Data Exploration
12.5. Data Analysis
12.5.1. Vocabulary Diversity
12.5.2. Information Density
12.5.3. Language Formality
12.5.4. Information Complexity
12.5.5. Domain Specificity
12.6. Business Goals and Targeted Information
12.7. Suggested Reading
Chapter 13: Fundamentals of Project Design
13.1. Introduction to Project Design
13.2. Definition of Targeted Information
13.3. Taxonomy Design
13.3.1. Decomposition
13.3.2. Concept Types
13.4. Project Settings
13.4.1. Match Algorithm and Priority
13.4.2. Case Sensitivity
13.5. Suggested Reading
Chapter 14: Fundamentals of Model Measurement
14.1. Introduction to Model Measurement
14.2. Use of a Gold Standard Corpus
14.3. Setup of a Gold Standard Corpus
14.4. Setup of Approximate Annotations
14.5. Creation of Samples for Development and Testing
14.6. Model Quality and Decisions
14.6.1. Strategies for Overcoming Low Recall
14.6.2. Strategies for Overcoming Low Precision
14.7. Model Monitoring
14.8. Suggested Reading
References
← Prev
Back
Next →
← Prev
Back
Next →