The eight most common languages—Mandarin Chinese, English, Hindi, Spanish, Russian, Arabic, Bengali, and Portuguese—are spoken by half of the world’s population. Below are summaries of 15 of the world’s most commonly spoken languages. English is treated separately in Day 2.
♦Arabic Twenty-six countries around the world have adopted Arabic as an official language—more than any other language but English and French. There are approximately 280 million native speakers, and it is a second language for millions more who are familiar with Arabic through the recitation of the Koran. As with the Chinese language, spoken dialects are not mutually understandable, but Arabic speakers share a written language called Modern Standard Arabic. The main dialects are Levantine (spoken in Syria, Lebanon, Jordan, Palestine), Egyptian, Iraqi, and Moroccan. Arabic is an official language of the United Nations.
♦Bengali is the official language of Bangladesh and India. More than 180 million people worldwide count Bengali as their mother tongue. An estimated 230 million people speak Bengali, making it the sixth most spoken language in the world.
♦Chinese is the native language of more than 1.2 billion people, principally in China, but also in ethnic Chinese communities worldwide. Major dialects include Mandarin, also known as guoyu (“national language”), which is the official language of China, Taiwan, and Singapore; Guangdong (Cantonese, widely spoken in southern and southeastern China); Min (Fujian Province, Taiwan, and in Southeast Asian overseas
Chinese communities); Wu (Shanghai and nearby regions); and Hakka (southeastern China and Southeast Asian overseas Chinese communities). Although spoken dialects of Chinese are not mutually intelligible, they share the same writing system—Chinese characters, or hanzi, which represent words independently of their pronunciation. Chinese is also an official language of the United Nations.
♦French has some 150 million native speakers in France, Switzerland, Belgium, Canada, Africa, and in French overseas territories and possessions from Martinique to New Caledonia. It is widely used as a second language, especially in West Africa. French is an official language of the United Nations, and of 29 countries.
♦German is an official language of six European countries (Germany, Austria, Switzerland, Luxembourg, Liechtenstein, and Belgium) and the European Union. There are sizable German-speaking populations in eastern Europe, Brazil, and the United States. Native speakers total nearly 120 million.
♦Hindi has some 474 million native speakers in India alone, a number that rises significantly (50 million) if combined with the similar Urdu, an official language of Pakistan. Collectively, these and other closely related languages are known as Hindustani, which is also spoken in Malaysia, Singapore, Trinidad, Guyana, South Africa, Mauritius, and other countries with large expatriate Indo-Pakistani communities.
♦Japanese is the official language of Japan, where it is spoken by more than 130 million people, and the island nation of Palau. Significant Japanese-speaking populations are also found in Brazil and the United States. Modern Japanese employs four writing systems: kanji, hiragana, katakana, and romaji.
♦Korean is spoken primarily in North and South Korea, with significant Korean-speaking populations in Japan, Russia, China, and the United States. Native speakers total about 78 million people.
♦Malay Variants and dialects of Malay are used as an official language in Indonesia, Malaysia, Singapore, and Brunei. Although these four countries have a combined population of more than 270 million, only 40 million people claim Malay as their first language.
♦Persian includes Farsi, the official language of Iran, plus the closely related languages of Tajik (Tajikistan and parts of Afghanistan) and Dari (spoken by nearly half the population of Afghanistan. There are an estimated 60–70 million native speakers of various dialects of Persian.
♦Portuguese is spoken by 210 million people, the vast majority of whom live in Brazil, where it is an official language. Portuguese is used as an official language in only six other countries outside of Portugal, including Angola and four other countries in Africa; East Timor, an island Southeast Asia; and the special administrative region of Macau, on the southeastern coast of China.
♦Russian is the native language of approximately 164 million people. It is the official language of Russia, Belarus, Kazakhstan, and Kyrgyzstan. There are still significant numbers of people claiming Russian as a first language in the former republics of the Soviet Union, and approximately 114 million speak it as a second language. Russian is an official language of the U.N. and the International Atomic Energy Agency (I.A.E.A.).
♦Spanish is the official language of 23 countries, including Spain, Mexico, Colombia, Argentina, and many other nations of Central and South America and the Caribbean. Native speakers total about 700 million, with 100 million in Mexico alone, and including about 43 million in the United States. Spanish is also an official language of the United Nations.
♦Tamil is the principal language of the province of Tamil Nadu, in southeastern India and is also an official language of Sri Lanka and Singapore. There are approximately 66 million native speakers of Tamil, including the significant minority communities in the United States and Canada.
♦Turkish Standard (Anatolian) Turkish, plus closely related dialects and languages including Azeri, Kyrgyz, Kazakh, Türkmen, Tartar, Uighur, and Uzbek have approximately 120 million native speakers. Turkish languages are spoken across a huge span of central Eurasia, including Turkey, Azerbaijan, Kazakhstan, Kyrgyzstan, Turkmenistan, Uzbekistan, and the province of Xinjiang in northwestern China.
♦Vietnamese is spoken primarily in Vietnam, with sizable Vietnamese-speaking populations in the neighboring countries of Cambodia
and Laos, as well as in the United States. There are approximately 86 million people speaking Vietnamese, including native speakers and those who use it as a second language.
♦Language Families Languages that are related by descent from a common ancestor are said to belong to the same language family. The study of language families began in the late 18th century, when an official of the British East India Company, Sir William Jones, noticed that Sanskrit, Greek, and Latin have many similarities in vocabulary and grammar, and proposed that all three languages were derived from an extinct ancestral language, now known as proto-Indo-European. By the early 19th century, scholars began to systematically explore the similarities in languages spoken in regions from Ireland to India and from Scandinavia to Greece, and many other language families were discovered.
Language families were formed as populations migrated and became geographically separated, so that their original language divided into new but related languages, which then evolved separately.
The physical distribution of languages (now supplemented by genetic DNA studies) can be used to trace the movement of populations over the past several thousand years.
Most languages can be can be classified as members of a language family or subfamily. A few (Basque is the most famous example) are linguistic isolates, and there are also a number of hybrid forms (pidgins and creoles) that cross linguistic boundaries.
Eurasia Indo-European embraces about 150 languages spoken by some 3 billion people worldwide. It is the most widely distributed language family—evidence of a persistent drive for territorial expansion. Indo-Europeans migrated from somewhere near the Black Sea to India, Europe, and beyond. They are believed to have originated in an area between eastern Europe and the Aral Sea, around the fifth millennium B.C.
Indo-European is made up of several subfamilies, including Indo-Iranian (Bengali, Farsi, Hindi, Pashto, Sanskrit, Sinhalese, and others); Italic or Romance (Catalan, French, Italian, Latin, Portuguese, Romanian, Spanish); Germanic (Dutch, English, German, Gothic, Icelandic, Swedish, and others); Celtic (Breton, Gaelic, Welsh, and others); Baltic (Latvian, Lithuanian, and others); Slavic (Czech, Polish, Russian, Serbo-Croatian, Slavonic, and others); and Albanian, Armenian, and Greek.
Other language families of Eurasia are:
Caucasian includes Georgian, Circassian, Chechen, and the several languages of the Kartvelian subfamily, spoken south of the chief range of the Caucasus Mountains. Uralic encompasses Finnish, Estonian, Hungarian, and Saami. Altaic includes Turkish, Uzbek, Mongolian, Manchu, and Uighur. Korean and Japanese are sometimes described as Altaic languages, but this identification is highly controversial. Despite obvious similarities, no organic link between Korean and Japanese has been proven, leading some experts to conclude that Korean is linguistically isolated, and Japanese is part of the small Koguryoic. Sinitic encompasses all dialects of Chinese (Mandarin, Guangdong (Cantonese), Min, Wu, Hakka, and others). Elamo-Dravidian includes Tamil, Malayalam, Kannada, and other languages of South India. Austroasiatic contains 250 languages spoken in mainland Southeast Asia, including Vietnamese, Khmer (Cambodian), Mon, and Tai. Some linguists propose that Tai and closely related languages such as Lao belong instead in the Sino-Tibetan family.
Southeast Asia and the Pacific Islands The Austronesian family (also called Malayo-Polynesian) includes several hundred languages spoken in Southeast Asia and the islands of the Indian and Pacific Oceans, ranging from Malagasy (spoken in Madagascar, off the coast of Africa) to Hawai’ian and Maori in the Pacific. Austronesians moved from southeastern China to Indonesia and then eastward to the Pacific Islands and westward to Madagascar. The family is divided into Western Austronesian and Eastern Austronesian. Western Austronesian (including Malay, Indonesian, Javanese, Tagalog, and many others) languages are spoken by more than 300 million people. The Eastern Austronesian subfamily is further divided into Micronesian and Polynesian languages.
The Papuan family of New Guinea, and the nearby island groups of the Moluccas and Melanesia, includes several hundred languages divided into several subfamilies, and the Australian languages of the aboriginal peoples of Australia, who were isolated from external contact for perhaps 40,000 years, are similarly divided.
Africa The languages of Africa, and the adjacent Middle East, are divided into four families:
Afro-Asiatic includes Berber, Coptic, Hausa, and the languages of the Semitic subfamily: Hebrew, Arabic, Aramaic, and Amharic. The Nilo-Saharan family, found in northeastern, eastern, and central Africa, includes Turkana, Masai, Dinka, Mangbetu, Efe, and the numerous languages of
the Eastern Sudanic (including Nubian) and Central Sudanic subfamilies. The Niger-Kordofanic family is found in west, central, and southern Africa. It has two main branches, Kordofanic and Niger-Congo. The Benue-Congo subfamily group includes hundreds of languages of the Bantu subfamily (Swahili, Zulu, Xhosa, Sotho, Setsuana), reflecting a historic expansion of the Bantu peoples southward and eastward from an original homeland in the Congo basin.
Africa is home to the Khoisan family, which includes ancient languages (San and others) spoken by peoples largely displaced by the Bantu expansion. These languages are known for their distinctive “click” sounds.
The Americas The language families of the Americas include Eskimo-Aleut; Na-Dene (Athabascan and Navajo); and, according to the controversial theory of the late Joseph Greenberg, Amerind, which includes all other Native American languages. Other authorities divide these languages into discrete families:
Macro-Algonquian (eastern and northern woodlands and Pacific Northwest) includes Algonquin, Delaware, Cheyenne, Cree, Salish, Nootka, and Kwakiutl. Penutian (central and coastal California into Mexico and Central America) includes the widespread Mayan languages. Hokan-Siouan is widely distributed geographically, including Choktaw, Seminole, the Iroquois languages, Cherokee, Lakota, and many others. Aztec-Tanoan (southwestern North America and Mexico) includes Paiute, Shoshone, Comanche, Hopi, Nahuatl, and many others.
The most prominent languages of Central America, the Caribbean, and South America are Mixtecan and Toltecan (Mexico and Central America); Cariban and Arawakan (Caribbean); and Chibchan, Ge, Quechua, Aymara, Araukanian, and Tupi-Guarani (South America).
♦Pidgins, and Creoles Pidgins are simplified languages that are used for trading purposes between peoples with no common language (the word pidgin itself derives from the English word business). Pidgins can evolve into creoles, independent languages that combine features of two or more parent languages. Jamaican (with English and West African roots), Haitian (French and West African), and Hawai’ian Creole (combining Hawai’ian and English with Japanese, Tagalog, and other Asian languages) are the best-known examples.
Although English ranks only third among the world’s most spoken languages, it is on its way to becoming the first truly global language—used throughout the world as the language of commerce, diplomacy, and science. English is the mother tongue for some 450 million people, and a further 1.5 billion people use it as a second language to some degree. Beginning in the 17th century, the language spread throughout the British Empire to the Americas, Africa, India, and Oceana. Today 70 countries designate English as an official language (although it does not have that status in the United States), and it is an official language of the United Nations, the European Union, Nafta, NATO, and the Organization of American States.
♦A Brief History of the English Language English belongs to the Germanic branch of the Indo-European family of languages, and is related to most languages spoken in Europe, Scandinavia, India, and western Asia. The story of English begins with the arrival of the Jutes, Angles, and Saxons, Germanic peoples who invaded Britain in the fifth century and divided the island nation among themselves. In Common Germanic, the Angles were called Angli, which later mutated to Engle, and Engla land soon became the name of the nation these tribes now occupied. Old English (Anglo-Saxon), which is dated from the first written documents of this period to 1066, was an amalgamation of Germanic dialects, Latin, which persisted from the time of the Roman occupation, and Norse, which was brought by the Viking invaders who arrived in force in 865. Literacy came to England when the Anglo-Saxons converted to Christianity in the ninth century.
Middle English emerged following the Norman Conquest in 1066, which brought a distinctive French influence to the language. The Normans not only introduced new vocabulary, but changed the style of writing, from a clear, easily readable hand to the more ornate Carolingian script that was used on the Continent. Norman scribes began to change the spelling of Old English as well, introducing many of the conventions that remain in the language today. For example, the Old English cw became qu, giving us queen, instead of cwen.
As Anglo-Saxon rulers were replaced by the Norman conquerors, Norman-French became the language of government and high society, and the Norman influence on the English language is still apparent, notably
in the abundance of words related to the law and government: jury, court, and parliament are just some examples.
The Chancery Standard (CS) was developed during the reign of King Henry V (1413–1422), who ordered his government officials to use English instead of Norman-French in the course of business. CS was based on the dialects of London and the Midlands, which were the largest population centers at the time. By mid-century, CS was used in all official transactions, except by the church, which continued to use Latin, and its spread throughout the country was aided by the introduction of the printing press by William Caxton in the late 1470s.
The transition to Modern English began in the 15th century, with what is known as the “Great Vowel Shift”—a term coined in the 20th century by the Danish linguist Otto Jespersen. It refers to a major change in pronunciation that took place between 1450 and 1750. The exact cause of the shift remains a mystery, and some linguistic scholars are skeptical about its existence. One theory is that it was the result of a dramatic social change in the wake of the Black Death, which ravaged Europe from 1347 to 1351. In England, this caused a massive migration of people to the south of England, and an unprecedented mixing of social classes, who began to modify their regional vowel sounds into a standard pronunciation. Many of the peculiarities of English spelling persist from this period.
♦Codifying the Language In 1662, the Royal Society of London for the Promotion of Natural Knowledge appointed a committee of 22 men, including the poet John Dryden, to undertake the improvement of the English language. The committee failed in its attempt to find an appropriate arbiter of the language, and failed again in 1712, when Jonathan Swift attempted to enlist Robert Harley, the earl of Oxford, in the cause.
But finally in 1755, after eight years of work, Samuel Johnson, the great English critic and essayist, published A Dictionary of the English Language, which included 40,000 words, 114,000 illustrative quotations drawn from literary works dating back to the 1500s, and suggestions for proper usage. The dictionary reflects Johnson’s intelligence and idiosyncratic humor. He left out words he didn’t like—gambler, fuss, and shabby—but included words beyond the grasp of the average reader—deosculation (to kiss warmly), for example. His most famous definitions capture both his snobbish and self-deprecating nature:
Oats: a grain which in England is generally given the horses, but in Scotland supports the people.
Lexicographer: a writer of dictionaries; a harmless drudge who business himself in tracing the original, and detailing the significance of words.
It was a source of British pride that Dr. Johnson completed the dictionary almost single-handedly (he had six assistants), while the Académie Française spent 40 years, employing 40 people to publish a similar work. Dr. Johnson’s dictionary dominated lexicography for almost two centuries.
In 1857, the Philological Society of London decided that existing dictionaries were inadequate and sought to reexamine the language going back to Anglo-Saxon times. In 1879, they reached an agreement with Oxford University Press and James Murray to create a new reference work that would document the language from the Early Middle English period (1150) forward. The original plan called for a four-volume work that would take approximately 10 years to complete. In the end, the final volume of what became a 10-volume work was published in 1928 under the title A New English Dictionary on Historical Principles, immediately becoming the definitive authority on the English language.
Because the English language is always changing, work began on updating the dictionary almost as soon as it was published. In 1933, the dictionary, renamed The Oxford English Dictionary, expanded to 12 volumes plus one Supplement, and after many revisions and additions, in 1989 a definitive new edition was published in 20 volumes, followed in 1992 with an electronic edition, on CD-ROM. Today, the Oxford English Dictionary Online adds new words on a regular basis, providing a continuous record of changes in our language and society.
♦English in America The playwright and public intellectual George Bernard Shaw famously observed at the beginning of the 20th century that the United States and the United Kingdom are “two countries divided by a common language.” American English came into existence as soon as the first settlers arrived on the continent, bringing with them traditional English speech, leavened with their distinctive regional dialects. To this, they quickly added words from the local Indian languages, particularly those describing plants and animals that they had never seen before
(raccoon, moose, squash). Within 100 years, the original English settlers were joined by the Irish, Dutch, Scandinavians, and other European colonists who built settlements in the South and Midwest, while Spanish and Mexican settlers populated the Southwest and West. The American lexicon overflows with loanwords from these languages, including adobe, barbecue, and rodeo from the Spanish; cookie, freight, and dock from the Dutch.
With time and distance, the differences between British English and American English became more profound, and in 1806, Noah Webster set out to establish American English as a distinctive voice, quite apart from the British standard, with the publication of A Compendious Dictionary of the English Language, which defined 37,000 words. His more famous American Dictionary of the English Language, published in 1828, was a two-volume dictionary, included 65,000 words, and was based on the principles that spelling should be practical, and that grammar should reflect the way that language was actually spoken, not according to the rules set down by “experts.” The dictionary was not a commercial success, but was highly influential, and the name “Webster” has become synonymous with American dictionaries.
Americans continued to coin new words to reflect their evolving economic, political and social realities. Ballpark, supermarket, gerrymander, gasoline all entered the language in the 19th and 20th centuries, and in the same period, a new influx of German and Jewish immigrants taught us to schmooze and to eat hamburgers.
In 1877, Henry Sweet, an English philologist, predicted that within 100 years, American, British, and Australian English would be mutually unintelligible, but in the 21st century, because of globalization and advances in communications technology, regional variations are understood around the world.
♦New Words Dictionary publishers are always on the lookout for new words in an effort to keep the English language corpus up to date. Yet there is no formal mechanism for adding words to the English language. Neologisms—newly coined words that have not yet entered mainstream language—become popular mainly through the mass media and word of mouth. Thousands of words are coined each year, while only a few hundred achieve any kind of permanency in our vocabulary. Lexicographers often follow the use of a word for years before determining whether it deserves
inclusion in a dictionary, and publishers retain panels of experts to consider the words being created within their areas of expertise. Ultimately, there are as many opinions about the legitimacy of a word as there are publishers and lexicographers.
New words most often come from the worlds of science, popular culture, business, and now, from advances in the digital world. In 1955, Vladimir Nabokov coined the word nymphet, in the novel Lolita; in 1960, the word laser (from light amplification by stimulated emission of radiation) entered the language; in 1984, cyberspace appeared for the first time in William Gibson’s novel Neuromancer. New words that have made their way into dictionaries since 2000 include:
business: big-box; insourcing; rightsizing
food: chai; locavore; olestra; turducken
military: waterboarding; weaponize; WMD
popular culture/lifestyle: chill pill; emo; grunge; metrosexual;
plus-one; puh-leeze; supersize; unibrow
technology: biodiesel; digerati; google (as a verb); podcast; ringtone; USB port;
Since 1991, the American Dialect Society has designated a Word of the Year (WOTY), and the major English-language dictionary publishers followed suit. The winning words are not necessarily new, but may have taken on a new meaning or usage beyond the original, such as the 1992 WOTY Not!, meaning “just kidding.” Some words are assimilated into the language so quickly that it seems they have always existed. The winning words in 2009 were: unfriend, which beat out hashtag, and sexting (from the New Oxford American Dictionary); tweet, which won over birther, and shovel-ready (from the American Dialect Society), while Webster’s New World College Dictionary selected distracted driving.
Some experts argue that it takes two generations to know whether a word will have durability.