Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
Preface
Why Natural Language Processing Is Important and Difficult
Background
Philosophy
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
I. Basics
1. Getting Started
Introduction
Other Tools
Setting Up Your Environment
Prerequisites
Starting Apache Spark
Checking Out the Code
Getting Familiar with Apache Spark
Starting Apache Spark with Spark NLP
Loading and Viewing Data in Apache Spark
Hello World with Spark NLP
2. Natural Language Basics
What Is Natural Language?
Origins of Language
Spoken Language Versus Written Language
Linguistics
Phonetics and Phonology
Morphology
Syntax
Semantics
Sociolinguistics: Dialects, Registers, and Other Varieties
Formality
Context
Pragmatics
Roman Jakobson
How To Use Pragmatics
Writing Systems
Origins
Alphabets
Abjads
Abugidas
Syllabaries
Logographs
Encodings
ASCII
Unicode
UTF-8
Exercises: Tokenizing
Tokenize English
Tokenize Greek
Tokenize Ge’ez (Amharic)
Resources
3. NLP on Apache Spark
Parallelism, Concurrency, Distributing Computation
Parallelization Before Apache Hadoop
MapReduce and Apache Hadoop
Apache Spark
Architecture of Apache Spark
Physical Architecture
Logical Architecture
Spark SQL and Spark MLlib
Transformers
Estimators and Models
Evaluators
NLP Libraries
Functionality Libraries
Annotation Libraries
NLP in Other Libraries
Spark NLP
Annotation Library
Stages
Pretrained Pipelines
Finisher
Exercises: Build a Topic Model
Resources
4. Deep Learning Basics
Gradient Descent
Backpropagation
Convolutional Neural Networks
Filters
Pooling
Recurrent Neural Networks
Backpropagation Through Time
Elman Nets
LSTMs
Exercise 1
Exercise 2
Resources
II. Building Blocks
5. Processing Words
Tokenization
Vocabulary Reduction
Stemming
Lemmatization
Stemming Versus Lemmatization
Spelling Correction
Normalization
Bag-of-Words
CountVectorizer
N-Gram
Visualizing: Word and Document Distributions
Exercises
Resources
6. Information Retrieval
Inverted Indices
Building an Inverted Index
Vector Space Model
Stop-Word Removal
Inverse Document Frequency
In Spark
Exercises
Resources
7. Classification and Regression
Bag-of-Words Features
Regular Expression Features
Feature Selection
Modeling
Naïve Bayes
Linear Models
Decision/Regression Trees
Deep Learning Algorithms
Iteration
Exercises
8. Sequence Modeling with Keras
Sentence Segmentation
(Hidden) Markov Models
Section Segmentation
Part-of-Speech Tagging
Conditional Random Field
Chunking and Syntactic Parsing
Language Models
Recurrent Neural Networks
Exercise: Character N-Grams
Exercise: Word Language Model
Resources
9. Information Extraction
Named-Entity Recognition
Coreference Resolution
Assertion Status Detection
Relationship Extraction
Summary
Exercises
10. Topic Modeling
K-Means
Latent Semantic Indexing
Nonnegative Matrix Factorization
Latent Dirichlet Allocation
Exercises
11. Word Embeddings
Word2vec
GloVe
fastText
Transformers
ELMo, BERT, and XLNet
doc2vec
Exercises
III. Applications
12. Sentiment Analysis and Emotion Detection
Problem Statement and Constraints
Plan the Project
Design the Solution
Implement the Solution
Test and Measure the Solution
Business Metrics
Model-Centric Metrics
Infrastructure Metrics
Process Metrics
Offline Versus Online Model Measurement
Review
Initial Deployment
Fallback Plans
Next Steps
Conclusion
13. Building Knowledge Bases
Problem Statement and Constraints
Plan the Project
Design the Solution
Implement the Solution
Test and Measure the Solution
Business Metrics
Model-Centric Metrics
Infrastructure Metrics
Process Metrics
Review
Conclusion
14. Search Engine
Problem Statement and Constraints
Plan the Project
Design the Solution
Implement the Solution
Test and Measure the Solution
Business Metrics
Model-Centric Metrics
Review
Conclusion
15. Chatbot
Problem Statement and Constraints
Plan the Project
Design the Solution
Implement the Solution
Test and Measure the Solution
Business Metrics
Model-Centric Metrics
Review
Conclusion
16. Object Character Recognition
Kinds of OCR Tasks
Images of Printed Text and PDFs to Text
Images of Handwritten Text to Text
Images of Text in Environment to Text
Images of Text to Target
Note on Different Writing Systems
Problem Statement and Constraints
Plan the Project
Implement the Solution
Test and Measure the Solution
Model-Centric Metrics
Review
Conclusion
IV. Building NLP Systems
17. Supporting Multiple Languages
Language Typology
Scenario: Academic Paper Classification
Text Processing in Different Languages
Compound Words
Morphological Complexity
Transfer Learning and Multilingual Deep Learning
Search Across Languages
Checklist
Conclusion
18. Human Labeling
Guidelines
Scenario: Academic Paper Classification
Inter-Labeler Agreement
Iterative Labeling
Labeling Text
Classification
Tagging
Checklist
Conclusion
19. Productionizing NLP Applications
Spark NLP Model Cache
Spark NLP and TensorFlow Integration
Spark Optimization Basics
Design-Level Optimization
Profiling Tools
Monitoring
Managing Data Resources
Testing NLP-Based Applications
Unit Tests
Integration Tests
Smoke and Sanity Tests
Performance Tests
Usability Tests
Demoing NLP-Based Applications
Checklists
Model Deployment Checklist
Scaling and Performance Checklist
Testing Checklist
Conclusion
Glossary
Index
← Prev
Back
Next →
← Prev
Back
Next →