NLP

NLP is the field of using computer science and AI to process and analyze natural language data and then make machines able to interpret it as humans do. During the 1980s, when this concept started to get hyped, language processing systems were designed by hand coding a set of rules. Later, following increases in calculation power, a different approach, mostly based on statistical models, replaced the original one. A later ML approach (supervised learning first, also semi-supervised or unsupervised at present time) brought advances in this field, such as voice recognition software and human language translation, and will probably lead to more complex scenarios, such as natural language understanding and generation.

Here is how NLP works. The first task, called the speech-to-text process, is to understand the natural language received. A built-in model performs speech recognition, which does the conversion from natural to programming language. This happens by breaking down speech into very small units and then comparing them to previous units coming speech that has been input previously. The output determines the words and sentences that most probably have been said. The next task, called part-of-speech (POS) tagging (or word-category disambiguation in some literature), identifies words as their grammatical forms (nouns, adjectives, verbs, and so on) using a set of lexicon rules. At the end of these two phases, a machine should understand the meaning of the input speech. A possible third task of an NLP process is text-to-speech conversion: at the end, the programming language is converted into a textual or audible format understandable by humans. That's the ultimate goal of NLP: to build software that analyzes, understands, and can generate human languages in a natural way, making computers communicate as if they were humans.

Given a piece of text, there are three things that need to be considered and understood when implementing NLP:

The following subsections will explain the main concepts of NLP supervised learning.