Log In
Or create an account ->
Imperial Library
Home
About
News
Upload
Forum
Help
Login/SignUp
Index
1. Text Ingestion and Wrangling
Acquiring a Domain-Specific Corpus
Data Ingestion of Text
Scraping and Crawling
Ingestion using RSS Feeds and Feedparser
APIs: Twitter and Search
Corpus Data Management
Corpus Disk Structure
Corpus Readers
Streaming Data Access with NLTK
Reading an HTML Corpus
Preprocessing and Wrangling
Readability for Accessing Core Content
Documents, Discourse, and Paragraphs
Segmentation: Breaking out Sentences
Tokenization: Identifying Individual Tokens
Part-of-Speech Tagging
Transformation
Writing to Pickle
Pickle Corpus Reader
Corpus Monitoring
Corpus Meta Information
Conclusion
2. Machine Learning on Text
The Model Selection Triple
Model Selection as Search
The Scikit-Learn API
Vectorization
Frequency Vectors
One-Hot Encoding
Term Frequency-Inverse Document Frequency
Distributed Representation
Benefits and Limitations of Vector Encoding
Feature Extraction
Pipeline Basics
Corpus Loader
Text Normalization
Vectorization with Gensim
Document Level Features
Building Models
Conclusion
← Prev
Back
Next →
← Prev
Back
Next →