Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Half title Copyright Title Abstract Contents Preface Acknowledgments 1 Introduction
1.1 The Challenges of Natural Language Processing 1.2 Neural Networks and Deep Learning 1.3 Deep Learning in NLP
1.3.1 Success Stories
1.4 Coverage and Organization 1.5 What’s not Covered 1.6 A Note on Terminology 1.7 Mathematical Notation
Part I Supervised Classification and Feed-forward Neural Networks
2 Learning Basics and Linear Models
2.1 Supervised Learning and Parameterized Functions 2.2 Train, Test, and Validation Sets 2.3 Linear Models
2.3.1 Binary Classification 2.3.2 Log-linear Binary Classification 2.3.3 Multi-class Classification
2.4 Representations 2.5 One-Hot and Dense Vector Representations 2.6 Log-linear Multi-class Classification 2.7 Training as Optimization
2.7.1 Loss Functions 2.7.2 Regularization
2.8 Gradient-based Optimization
2.8.1 Stochastic Gradient Descent 2.8.2 Worked-out Example 2.8.3 Beyond SGD
3 From Linear Models to Multi-layer Perceptrons
3.1 Limitations of Linear Models: The XOR Problem 3.2 Nonlinear Input Transformations 3.3 Kernel Methods 3.4 Trainable Mapping Functions
4 Feed-forward Neural Networks
4.1 A Brain-inspired Metaphor 4.2 In Mathematical Notation 4.3 Representation Power 4.4 Common Nonlinearities 4.5 Loss Functions 4.6 Regularization and Dropout 4.7 Similarity and Distance Layers 4.8 Embedding Layers
5 Neural Network Training
5.1 The Computation Graph Abstraction
5.1.1 Forward Computation 5.1.2 Backward Computation (Derivatives, Backprop) 5.1.3 Software 5.1.4 Implementation Recipe 5.1.5 Network Composition
5.2 Practicalities
5.2.1 Choice of Optimization Algorithm 5.2.2 Initialization 5.2.3 Restarts and Ensembles 5.2.4 Vanishing and Exploding Gradients 5.2.5 Saturation and Dead Neurons 5.2.6 Shuffling 5.2.7 Learning Rate 5.2.8 Minibatches
Part II Working with Natural Language Data
6 Features for Textual Data
6.1 Typology of NLP Classification Problems 6.2 Features for NLP Problems
6.2.1 Directly Observable Properties 6.2.2 Inferred Linguistic Properties 6.2.3 Core Features vs. Combination Features 6.2.4 Ngram Features 6.2.5 Distributional Features
7 Case Studies of NLP Features
7.1 Document Classification: Language Identification 7.2 Document Classification: Topic Classification 7.3 Document Classification: Authorship Attribution 7.4 Word-in-context: Part of Speech Tagging 7.5 Word-in-context: Named Entity Recognition 7.6 Word in Context, Linguistic Features: Preposition Sense Disambiguation 7.7 Relation Between Words in Context: Arc-Factored Parsing
8 From Textual Features to Inputs
8.1 Encoding Categorical Features
8.1.1 One-hot Encodings 8.1.2 Dense Encodings (Feature Embeddings) 8.1.3 Dense Vectors vs. One-hot Representations
8.2 Combining Dense Vectors
8.2.1 Window-based Features 8.2.2 Variable Number of Features: Continuous Bag of Words
8.3 Relation Between One-hot and Dense Vectors 8.4 Odds and Ends
8.4.1 Distance and Position Features 8.4.2 Padding, Unknown Words, and Word Dropout 8.4.3 Feature Combinations 8.4.4 Vector Sharing 8.4.5 Dimensionality 8.4.6 Embeddings Vocabulary 8.4.7 Network’s Output
8.5 Example: Part-of-Speech Tagging 8.6 Example: Arc-factored Parsing
9 Language Modeling
9.1 The Language Modeling Task 9.2 Evaluating Language Models: Perplexity 9.3 Traditional Approaches to Language Modeling
9.3.1 Further Reading 9.3.2 Limitations of Traditional Language Models
9.4 Neural Language Models 9.5 Using Language Models for Generation 9.6 Byproduct: Word Representations
10 Pre-trained Word Representations
10.1 Random Initialization 10.2 Supervised Task-specific Pre-training 10.3 Unsupervised Pre-training
10.3.1 Using Pre-trained Embeddings
10.4 Word Embedding Algorithms
10.4.1 Distributional Hypothesis and Word Representations 10.4.2 From Neural Language Models to Distributed Representations 10.4.3 Connecting the Worlds 10.4.4 Other Algorithms
10.5 The Choice of Contexts
10.5.1 Window Approach 10.5.2 Sentences, Paragraphs, or Documents 10.5.3 Syntactic Window 10.5.4 Multilingual 10.5.5 Character-based and Sub-word Representations
10.6 Dealing with Multi-word Units and Word Inflections 10.7 Limitations of Distributional Methods
11 Using Word Embeddings
11.1 Obtaining Word Vectors 11.2 Word Similarity 11.3 Word Clustering 11.4 Finding Similar Words
11.4.1 Similarity to a Group of Words
11.5 Odd-one Out 11.6 Short Document Similarity 11.7 Word Analogies 11.8 Retrofitting and Projections 11.9 Practicalities and Pitfalls
12 Case Study: A Feed-forward Architecture for Sentence Meaning Inference
12.1 Natural Language Inference and the SNLI Dataset 12.2 A Textual Similarity Network
Part III Specialized Architectures
13 Ngram Detectors: Convolutional Neural Networks
13.1 Basic Convolution + Pooling
13.1.1 1D Convolutions Over Text 13.1.2 Vector Pooling 13.1.3 Variations
13.2 Alternative: Feature Hashing 13.3 Hierarchical Convolutions
14 Recurrent Neural Networks: Modeling Sequences and Stacks
14.1 The RNN Abstraction 14.2 RNN Training 14.3 Common RNN Usage-patterns
14.3.1 Acceptor 14.3.2 Encoder 14.3.3 Transducer
14.4 Bidirectional RNNs (biRNN) 14.5 Multi-layer (stacked) RNNs 14.6 RNNs for Representing Stacks 14.7 A Note on Reading the Literature
15 Concrete Recurrent Neural Network Architectures
15.1 CBOW as an RNN 15.2 Simple RNN 15.3 Gated Architectures
15.3.1 LSTM 15.3.2 GRU
15.4 Other Variants 15.5 Dropout in RNNs
16 Modeling with Recurrent Networks
16.1 Acceptors
16.1.1 Sentiment Classification 16.1.2 Subject-verb Agreement Grammaticality Detection
16.2 RNNs as Feature Extractors
16.2.1 Part-of-speech Tagging 16.2.2 RNN–CNN Document Classification 16.2.3 Arc-factored Dependency Parsing
17 Conditioned Generation
17.1 RNN Generators
17.1.1 Training Generators
17.2 Conditioned Generation (Encoder-Decoder)
17.2.1 Sequence to Sequence Models 17.2.2 Applications 17.2.3 Other Conditioning Contexts
17.3 Unsupervised Sentence Similarity 17.4 Conditioned Generation with Attention
17.4.1 Computational Complexity 17.4.2 Interpretability
17.5 Attention-based Models in NLP
17.5.1 Machine Translation 17.5.2 Morphological Inflection 17.5.3 Syntactic Parsing
Part IV Additional Topics
18 Modeling Trees with Recursive Neural Networks
18.1 Formal Definition 18.2 Extensions and Variations 18.3 Training Recursive Neural Networks 18.4 A Simple Alternative–Linearized Trees 18.5 Outlook
19 Structured Output Prediction
19.1 Search-based Structured Prediction
19.1.1 Structured Prediction with Linear Models 19.1.2 Nonlinear Structured Prediction 19.1.3 Probabilistic Objective (CRF) 19.1.4 Approximate Search 19.1.5 Reranking 19.1.6 See Also
19.2 Greedy Structured Prediction 19.3 Conditional Generation as Structured Output Prediction 19.4 Examples
19.4.1 Search-based Structured Prediction: First-order Dependency Parsing 19.4.2 Neural-CRF for Named Entity Recognition 19.4.3 Approximate NER-CRF With Beam-Search
20 Cascaded, Multi-task and Semi-supervised Learning
20.1 Model Cascading 20.2 Multi-task Learning
20.2.1 Training in a Multi-task Setup 20.2.2 Selective Sharing 20.2.3 Word-embeddings Pre-training as Multi-task Learning 20.2.4 Multi-task Learning in Conditioned Generation 20.2.5 Multi-task Learning as Regularization 20.2.6 Caveats
20.3 Semi-supervised Learning 20.4 Examples
20.4.1 Gaze-prediction and Sentence Compression 20.4.2 Arc Labeling and Syntactic Parsing 20.4.3 Preposition Sense Disambiguation and Preposition Translation Prediction 20.4.4 Conditioned Generation: Multilingual Machine Translation, Parsing, and Image Captioning
20.5 Outlook
21 Conclusion
21.1 What Have We Seen? 21.2 The Challenges Ahead
Bibliography Author’s Biography
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion