Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
Preface
Computational Challenges of Natural Language
Linguistic Data: Tokens and Words Enter Machine Learning
Tools for Text Analysis What to Expect from This Book Who This Book Is For Code Examples and GitHub Repository Conventions Used in This Book Using Code Examples O’Reilly Safari How to Contact Us Acknowledgments
1. Language and Computation
The Data Science Paradigm Language-Aware Data Products
The Data Product Pipeline
The model selection triple
Language as Data
A Computational Model of Language Language Features Contextual Features Structural Features
Conclusion
2. Building a Custom Corpus
What Is a Corpus?
Domain-Specific Corpora The Baleen Ingestion Engine
Corpus Data Management
Corpus Disk Structure
The Baleen disk structure
Corpus Readers
Streaming Data Access with NLTK Reading an HTML Corpus
Corpus monitoring
Reading a Corpus from a Database
Conclusion
3. Corpus Preprocessing and Wrangling
Breaking Down Documents
Identifying and Extracting Core Content Deconstructing Documents into Paragraphs Segmentation: Breaking Out Sentences Tokenization: Identifying Individual Tokens Part-of-Speech Tagging Intermediate Corpus Analytics
Corpus Transformation
Intermediate Preprocessing and Storage
Writing to pickle
Reading the Processed Corpus
Conclusion
4. Text Vectorization and Transformation Pipelines
Words in Space
Frequency Vectors
With NLTK In Scikit-Learn The Gensim way
One-Hot Encoding
With NLTK In Scikit-Learn The Gensim way
Term Frequency–Inverse Document Frequency
With NLTK In Scikit-Learn The Gensim way
Distributed Representation
The Gensim way
The Scikit-Learn API
The BaseEstimator Interface Extending TransformerMixin
Creating a custom Gensim vectorization transformer Creating a custom text normalization transformer
Pipelines
Pipeline Basics Grid Search for Hyperparameter Optimization Enriching Feature Extraction with Feature Unions
Conclusion
5. Classification for Text Analysis
Text Classification
Identifying Classification Problems Classifier Models
Building a Text Classification Application
Cross-Validation
Streaming access to k splits
Model Construction Model Evaluation Model Operationalization
Conclusion
6. Clustering for Text Similarity
Unsupervised Learning on Text Clustering by Document Similarity
Distance Metrics Partitive Clustering
k-means clustering Optimizing k-means Handling uneven geometries
Hierarchical Clustering
Agglomerative clustering
Modeling Document Topics
Latent Dirichlet Allocation
In Scikit-Learn The Gensim way Visualizing topics
Latent Semantic Analysis
In Scikit-Learn The Gensim way
Non-Negative Matrix Factorization
In Scikit-Learn
Conclusion
7. Context-Aware Text Analysis
Grammar-Based Feature Extraction
Context-Free Grammars Syntactic Parsers Extracting Keyphrases Extracting Entities
n-Gram Feature Extraction
An n-Gram-Aware CorpusReader Choosing the Right n-Gram Window Significant Collocations
n-Gram Language Models
Frequency and Conditional Frequency Estimating Maximum Likelihood Unknown Words: Back-off and Smoothing Language Generation
Conclusion
8. Text Visualization
Visualizing Feature Space
Visual Feature Analysis
n-gram viewer Network visualization Co-occurrence plots Text x-rays and dispersion plots
Guided Feature Engineering
Part-of-speech tagging Most informative features
Model Diagnostics
Visualizing Clusters Visualizing Classes Diagnosing Classification Error
Classification report heatmaps Confusion matrices
Visual Steering
Silhouette Scores and Elbow Curves
Silhouette scores Elbow curves
Conclusion
9. Graph Analysis of Text
Graph Computation and Analysis
Creating a Graph-Based Thesaurus Analyzing Graph Structure Visual Analysis of Graphs
Extracting Graphs from Text
Creating a Social Graph
Finding entity pairs Property graphs Implementing the graph extraction
Insights from the Social Graph
Centrality Structural analysis
Entity Resolution
Entity Resolution on a Graph Blocking with Structure Fuzzy Blocking
Conclusion
10. Chatbots
Fundamentals of Conversation
Dialog: A Brief Exchange Maintaining a Conversation
Rules for Polite Conversation
Greetings and Salutations Handling Miscommunication
Entertaining Questions
Dependency Parsing Constituency Parsing Question Detection From Tablespoons to Grams
Learning to Help
Being Neighborly Offering Recommendations
Conclusion
11. Scaling Text Analytics with Multiprocessing and Spark
Python Multiprocessing
Running Tasks in Parallel Process Pools and Queues Parallel Corpus Preprocessing
Cluster Computing with Spark
Anatomy of a Spark Job Distributing the Corpus RDD Operations NLP with Spark
From Scikit-Learn to MLLib Feature extraction Text clustering with MLLib Text classification with MLLib Local fit, global evaluation
Conclusion
12. Deep Learning and Beyond
Applied Neural Networks Neural Language Models
Artificial Neural Networks
Training a multilayer perceptron
Deep Learning Architectures
TensorFlow: A framework for deep learning Keras: An API for deep learning
Sentiment Analysis
Deep Structure Analysis
Predicting sentiment with a bag-of-keyphrases
The Future Is (Almost) Here
Glossary Index
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion