Log In
Or create an account -> 
Imperial Library
  • Home
  • About
  • News
  • Upload
  • Forum
  • Help
  • Login/SignUp

Index
1. Text Ingestion and Wrangling
Acquiring a Domain-Specific Corpus Data Ingestion of Text
Scraping and Crawling Ingestion using RSS Feeds and Feedparser APIs: Twitter and Search
Corpus Data Management
Corpus Disk Structure Corpus Readers
Streaming Data Access with NLTK Reading an HTML Corpus
Preprocessing and Wrangling
Readability for Accessing Core Content Documents, Discourse, and Paragraphs Segmentation: Breaking out Sentences Tokenization: Identifying Individual Tokens Part-of-Speech Tagging Transformation
Writing to Pickle Pickle Corpus Reader
Corpus Monitoring
Corpus Meta Information
Conclusion
2. Machine Learning on Text
The Model Selection Triple
Model Selection as Search The Scikit-Learn API
Vectorization
Frequency Vectors One-Hot Encoding Term Frequency-Inverse Document Frequency Distributed Representation Benefits and Limitations of Vector Encoding
Feature Extraction
Pipeline Basics Corpus Loader Text Normalization Vectorization with Gensim Document Level Features Building Models
Conclusion
  • ← Prev
  • Back
  • Next →
  • ← Prev
  • Back
  • Next →

Chief Librarian: Las Zenow <zenow@riseup.net>
Fork the source code from gitlab
.

This is a mirror of the Tor onion service:
http://kx5thpx2olielkihfyo4jgjqfb7zx7wxr3sd4xzt26ochei4m6f7tayd.onion