The computer has become the medium of choice through which much of our language use is channeled. Modern computer systems therefore spend a good part of their time working on human language. This is a positive development: not only does it give everyone on the internet access to a world of information well beyond the scope of even the best research libraries of the 1960s and 1970s, it also creates new capabilities for creation, exploitation, and management of information. These include tools that support nonfiction, creative writing, blogs and diaries, citizen journalism and social interactions, web search and online booking systems, smart library catalogs, knowledge discovery, spoken language dialogs, and foreign language learning.
This book takes you on a tour of different real-world tasks and applications where computers deal with language. During this tour, you will encounter essential concepts relating to language, representation, and processing, so that by the end of the book you will have a good grasp of key concepts in the field of computational linguistics. The only background you need to read this book is some curiosity about language and some everyday experience with computers.
This is indeed why the book is organized around real-world tasks and applications. We assume that most of you will be familiar with many of the applications and may wonder how they work or why they don’t work. What you may not realize is how similar the underlying processing is. For example, there is a great deal in common between how grammar checkers and automatic speech-recognition systems work. We hope that demonstrating how these concepts recur – in this case, in something called n-grams – will reinforce the importance of applying general techniques to new applications.
The book is designed to make you aware of how technology works and how language works. We focus on a few applications of language technology (LT), computational linguistics (CL), and natural language processing (NLP). LT, CL, and NLP are essentially names for the same thing, seen from the perspectives of industry, linguistics, and computer science, respectively. The tasks and applications were chosen because: (i) they are representative of techniques used throughout the field; (ii) they represent a significant body of work in and of themselves; (iii) they connect directly to linguistic modeling; and (iv) they are the ones the authors know best. We hope that you will be able to use these examples as an introduction to general concepts that you can apply to learning about other applications and areas of inquiry.
There are a number of features in this textbook that allow you to structure what you learn, explore more about the topics, and reinforce what you are learning. As a start, the relevant concepts being covered are typeset in bold and shown in the margins of each page. You can also look those up in the Concept Index at the end of the book.
The Under the Hood sections included in many of the chapters are intended to give you more detail on selected advanced topics. For those interested in learning more about language and computers, we hope that you find these sections enjoyable and enlightening, though the gist of each chapter can be understood without reading them.
At the end of each chapter there is a Checklist indicating what you should have learned. The Exercises also found at the end of each chapter review the material and give you opportunities to go beyond it. Our hope is that the checklist and exercises help you to get a good grasp of each of the topics and concepts involved. We recognize, however, that students from different backgrounds have different skills, so we have marked each question with an indication of who the question is for. There are four designations: most questions are appropriate for all students and thus are marked with ALL; LING questions assume some background and interest in linguistics; CS questions are appropriate for those with a background in computer science; and MATH is appropriate for those wanting to tackle more mathematical challenges. Of course, you should not feel limited by these markers, as a strong enough desire will generally allow you to tackle most questions.
If you enjoy the topic of a particular chapter, we encourage you to make use of the Further reading recommendations. You can also follow the page numbers under each entry in the References at the end of the book to the place where it is discussed in the book.
Finally, on the book’s companion website http://purl.org/lang-and-comp we have collected resources and links to other materials that could be of interest to you when exploring topics around language and computers.