Natural Language Processing With Java · Techniques for Building Machine Learning and Neural Network Models for NLP, 2nd Edition by Reese, Richard M. -- Read -- Imperial Library of Trantor

Index

Title Page Copyright and Credits

Natural Language Processing with Java Second Edition

Dedication Packt Upsell

Why subscribe? PacktPub.com

Contributors

About the authors About the reviewers Packt is searching for authors like you

Preface

Who this book is for What this book covers To get the most out of this book

Download the example code files Download the color images Conventions used

Get in touch

Reviews

Introduction to NLP

What is NLP? Why use NLP? Why is NLP so hard? Survey of NLP tools

Apache OpenNLP Stanford NLP LingPipe GATE UIMA Apache Lucene Core

Deep learning for Java Overview of text-processing tasks

Finding parts of text Finding sentences Feature-engineering Finding people and things Detecting parts of speech Classifying text and documents Extracting relationships Using combined approaches

Understanding NLP models

Identifying the task Selecting a model Building and training the model Verifying the model Using the model

Preparing data Summary

Finding Parts of Text

Understanding the parts of text What is tokenization?

Uses of tokenizers

Simple Java tokenizers

Using the Scanner class

Specifying the delimiter

Using the split method Using the BreakIterator class Using the StreamTokenizer class Using the StringTokenizer class Performance considerations with Java core tokenization

NLP tokenizer APIs

Using the OpenNLPTokenizer class

Using the SimpleTokenizer class Using the WhitespaceTokenizer class Using the TokenizerME class

Using the Stanford tokenizer

Using the PTBTokenizer class Using the DocumentPreprocessor class Using a pipeline Using LingPipe tokenizers

Training a tokenizer to find parts of text Comparing tokenizers

Understanding normalization

Converting to lowercase Removing stopwords

Creating a StopWords class Using LingPipe to remove stopwords

Using stemming

Using the Porter Stemmer Stemming with LingPipe

Using lemmatization

Using the StanfordLemmatizer class Using lemmatization in OpenNLP

Normalizing using a pipeline

Summary

Finding Sentences

The SBD process What makes SBD difficult? Understanding the SBD rules of LingPipe's HeuristicSentenceModel class Simple Java SBDs

Using regular expressions Using the BreakIterator class

Using NLP APIs

Using OpenNLP

Using the SentenceDetectorME class Using the sentPosDetect method

Using the Stanford API

Using the PTBTokenizer class Using the DocumentPreprocessor class Using the StanfordCoreNLP class

Using LingPipe

Using the IndoEuropeanSentenceModel class Using the SentenceChunker class Using the MedlineSentenceModel class

Training a sentence-detector model

Using the Trained model Evaluating the model using the SentenceDetectorEvaluator class

Summary

Finding People and Things

Why is NER difficult? Techniques for name recognition

Lists and regular expressions Statistical classifiers

Using regular expressions for NER

Using Java's regular expressions to find entities Using the RegExChunker class of LingPipe

Using NLP APIs

Using OpenNLP for NER

Determining the accuracy of the entity Using other entity types Processing multiple entity types

Using the Stanford API for NER Using LingPipe for NER

Using LingPipe's named entity models Using the ExactDictionaryChunker class

Building a new dataset with the NER annotation tool Training a model

Evaluating a model

Summary

Detecting Part of Speech

The tagging process

The importance of POS taggers What makes POS difficult?

Using the NLP APIs

Using OpenNLP POS taggers

Using the OpenNLP POSTaggerME class for POS taggers Using OpenNLP chunking Using the POSDictionary class

Obtaining the tag dictionary for a tagger Determining a word's tags Changing a word's tags Adding a new tag dictionary Creating a dictionary from a file

Using Stanford POS taggers

Using Stanford MaxentTagger Using the MaxentTagger class to tag textese Using the Stanford pipeline to perform tagging

Using LingPipe POS taggers

Using the HmmDecoder class with Best_First tags Using the HmmDecoder class with NBest tags Determining tag confidence with the HmmDecoder class

Training the OpenNLP POSModel

Summary

Representing Text with Features

N-grams Word embedding GloVe Word2vec Dimensionality reduction Principle component analysis Distributed stochastic neighbor embedding Summary

Information Retrieval

Boolean retrieval Dictionaries and tolerant retrieval

Wildcard queries Spelling correction Soundex

Vector space model Scoring and term weighting Inverse document frequency TF-IDF weighting Evaluation of information retrieval systems Summary

Classifying Texts and Documents

How classification is used Understanding sentiment analysis Text-classifying techniques Using APIs to classify text

Using OpenNLP

Training an OpenNLP classification model Using DocumentCategorizerME to classify text

Using the Stanford API

Using the ColumnDataClassifier class for classification Using the Stanford pipeline to perform sentiment analysis

Using LingPipe to classify text

Training text using the Classified class Using other training categories Classifying text using LingPipe Sentiment analysis using LingPipe Language identification using LingPipe

Summary

Topic Modeling

What is topic modeling? The basics of LDA Topic modeling with MALLET

Training Evaluation

Summary

Using Parsers to Extract Relationships

Relationship types Understanding parse trees Using extracted relationships Extracting relationships Using NLP APIs

Using OpenNLP Using the Stanford API

Using the LexicalizedParser class Using the TreePrint class Finding word dependencies using the GrammaticalStructure class

Finding coreference resolution entities

Extracting relationships for a question-answer system

Finding the word dependencies Determining the question type Searching for the answer

Summary

Combined Pipeline

Preparing data Using boilerpipe to extract text from HTML Using POI to extract text from Word documents Using PDFBox to extract text from PDF documents Using Apache Tika for content analysis and extraction Pipelines Using the Stanford pipeline Using multiple cores with the Stanford pipeline Creating a pipeline to search text Summary

Creating a Chatbot

Chatbot architecture Artificial Linguistic Internet Computer Entity

Understanding AIML Developing a chatbot using ALICE and AIML

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

← Prev
Back
Next →

← Prev
Back
Next →