Chapter 15. 200 Languages and Counting

So far we've concentrated on the English-language version of Wikipedia, but Wikipedias have been created in over 250 languages, each representing its own individual community and unique collection of content. A common assumption is that articles in the other Wikipedias are basically translated from English, but this couldn't be more misleading: These sites all create their own content with translations only playing a minor role. Taken as a whole, the Wikimedia projects count as one of the most comprehensively multilingual and global projects on the Internet today.[32]

The English-language Wikipedia is the largest site, but other Wikipedias are also impressively large: Fifteen of the other-language editions of Wikipedia have over 100,000 articles. These very active sites often have high growth rates and are technically innovative. If you visit http://wikipedia.org/ (Figure 15-1), you'll see that it serves as the gateway to the other language editions of Wikipedia.

In this chapter, we'll explore what being global means for Wikipedia, by now a truly international and connected project. What are other-language Wikipedias like, and how can you get involved in them? We'll also talk about language issues as they relate to the English-language Wikipedia, including displaying foreign-language characters, writing about topics from a global perspective, and adding links to other-language versions of Wikipedia.

A very early goal of the project was to make Wikipedia multilingual; Jimmy Wales first proposed a German-language version of Wikipedia in early 2001. By May 2001, within months of the English-language Wikipedia's founding, Wikipedias had been started in Catalan, Chinese, German, French, Hebrew, Italian, Spanish, Japanese, Russian, Portuguese, and Esperanto.

New language editions continue to be added, as described in "The Long Tail of Languages" below. As of mid-2008, the largest Wikipedias were in English (2.3 million articles), German (755,000 articles), French (665,000 articles), Polish (505,000 articles), and Japanese (494,000 articles). Size alone should not be taken as the only criterion of prominence, however. For instance, the Chinese-language Wikipedia has often attracted media attention, in part because the Chinese government continues to partially limit access to the site within China (as part of the so-called Great Firewall of China). Despite this, the Chinese-language Wikipedia has more than 178,000 articles, written in large part by the many Chinese editors in Taiwan, Hong Kong, and outside East Asia.

Wikipedia has at least 77 language editions with over 10,000 articles and 155 with over 1,000. By the time a Wikipedia reaches 1,000 articles, it usually has a consistent approach, a self-regenerating community, and a basic policy structure in place. The remaining sites are just getting started, with a handful of articles and active contributors, as the next section explains.

In the generally optimistic Wikipedia way, many language versions of Wikipedia have been started but, at this time, have only a few hundred articles. What function do these sites serve? No one could call them a comprehensive encyclopedic resource yet. The truth is that they are just beginning wikis—much like the English-language Wikipedia in 2001 or 2002. If you do speak one of these languages fluently enough to contribute, working on a smaller Wikipedia can be a great deal of fun. You'll find the culture of a small site with few users is very different from the giant English-language Wikipedia, which has so many customs, rules, and (obviously) so many more articles already written. Even on the small Wikipedias, articles are, for the most part, not translations but instead newly written pieces in that particular language.

Wikipedia.org portal page, showing all the languages

Figure 15-1. Wikipedia.org portal page, showing all the languages


Sometimes other-language Wikipedias are small because they are very new or because not many people speak the language, and thus, the potential contributor base is small. Alternatively, Wikipedias may exist in widely spoken languages that do not have a strong presence on the Internet, such as Telugu, the third-most spoken language on the Indian subcontinent and one of the top fifteen spoken languages in the world (Figure 15-2 shows the front page of the Telugu Wikipedia). These languages may not have a strong written tradition, or perhaps Internet access is limited in the areas where most native speakers live. Some of these Wikipedias quite possibly constitute the largest online corpus for their language; in any case, they represent the language online in a place where others can easily find it.


The range of languages represented by Wikipedia is very large. Wikipedias exist in constructed languages (Esperanto [eo] and Volapük [vo] with their internationalist aim) and significant dead languages (Latin [la] and Old Church Slavonic [cu]), which have no native speakers.[33] (The two-letter codes are language identifier codes, explained in "Links Between Languages" on Links Between Languages). The issue of language preservation, when few native speakers of a smaller language are living, is not an explicit Wikimedia Foundation goal. But, on the other hand, providing free information to all people in the world, regardless of their language, certainly is, and many of the smaller Wikipedias represent the only online reference works in that language. In some cases, Wikipedia may be the only encyclopedia in a particular language! Despite this diversity, the 250+ languages already supported by Wikimedia come nowhere close to representing all of the world's active languages. SIL, creators of the Ethnologue, a standard reference work for languages and one of the maintainers of the ISO standard for identifying languages, suggests the total count of world languages is closer to 5,000.

Therefore, new language editions are still being proposed and started. How does this work? The key requirement is that you can provide evidence of a potential active user base. Active volunteers for the new Wikipedia will be needed to provide the content and watch the wiki for spam and vandalism. Wikipedia has a procedure for beginning a new language project, and all new requests must be approved by the site developers, who can create the project. Meta, the Wikimedia wiki discussed in Chapter 17, has a special page for making these requests. Once a request has been submitted, a committee on new language editions, or langcom, reviews the request. Someone fluent in the language must commit to translating the Mediawiki interface (including the text of tabs, buttons, and key pages) for the new project. You can see (and participate in) some new projects in the translation process at http://incubator.wikimedia.org/.

The challenge of editing on another language Wikipedia can be interesting and worthwhile, even if you only have a minimal knowledge of the language in question. All the Wikipedias use the same MediaWiki engine, so buttons, navigation links, and icons have familiar functions, regardless of the labels on them.

One way to help out is to watch content on a small wiki. Simply remember to check Recent Changes every so often on a slow-growing wiki, and you can help keep spam and poor contributions to a minimum. To adopt a wiki, you really need only be familiar enough with Wikipedia's standards to recognize definitely unhelpful changes. Seeing the fresh edits will help you to direct new editors to multilingual meta-pages and to identify good new editors. The wikipedia-L mailing list is for discussions of general Wikipedia-related issues in any language.

At this time, you must create an account for each new language project you wish to work on. This is changing with the introduction in mid-2008 of single-user login, sometimes called unified login, which users can use to link their existing accounts across all Wikimedia projects (see "Project Accounts and Single-User Login" on Wikimedia Commons for more). All Wikipedias should allow anonymous editing, however, which may be easier if you just want to make a few changes. If you edit when not logged in, watch out for compulsory previews when you try to save: Click what you suspect must be the Show Preview button.

What about adding an edit summary in another language? Projects may vary on this. For instance, an edit summary is compulsory on the Polish Wikipedia; otherwise, you won't be able to save unless you're logged in. On the Portuguese Wikipedia, if you're not logged in you must fill in a CAPTCHA box before saving your edit. If you're prepared for these occasional extra formalities, editing Wikipedias in other languages is actually very easy.

Remember that policies, guidelines, and community practices may vary a great deal between different language communities. Although some basic principles—NPOV, civility, and the GFDL license, for instance—are fundamental to all Wikimedia projects, how procedures are carried out is decided by the project community. You'll often find that a smaller project has fewer rules and guidelines and debate tends to be more thoughtful than on larger projects that receive more outside attention.

With a full range of languages comes a full range of writing systems: Greek, Cyrillic, Arabic, Hebrew, ideograms, and other less-familiar ones. Even languages that use the basic Roman alphabet may use accents and other diacritics. Scripts of all kinds are also used and integrated into the English-language version of Wikipedia, for instance, to give original forms of proper names. Figure 15-3 shows an example in the article [[Mohandas Karamchand Gandhi]], which uses Gujarati and Sanskrit scripts in the lead section as well as IPA pronunciation symbols.


Language support for operating systems is certainly still driven by demand in the developed world, and this means that many of less widely used scripts, such as those for some Indic languages, will not typically be supported natively by your browser or operating system. Character sets that usually need to be downloaded include those for native languages. To find these fonts, the Wikipedia edition in that language can be a good resource; Wikipedias that use non-Latin scripts often have a help page about where to get the necessary fonts for viewing them linked from their main page. For instance, to see the proper rendering of Cherokee in native script in the [[Cherokee]] article, you must download a special font; the help pages on the Cherokee-language Wikipedia at http://chr.wikipedia.org/ give details on how to find the appropriate fonts.

When composing articles, if you don't have a keyboard with the characters you need, you'll find that many types of scripts, for example, Cyrillic and Chinese characters, can be copied and saved successfully onto Wikipedia pages from other documents. (This works because of Unicode character encoding, or [[UTF-8]].) Most operating systems, including Windows, Mac OS X, and many Linux distributions, also allow you to change your keyboard layout virtually so you can type directly in another language. In Windows XP, for example, you can do this via the Control Panel under Regional and Language options. The [[Help:Multilingual support (Indic)]] page gives complete directions for inputting characters in Indic languages for several operating systems; these directions are also appropriate for other character sets.

Finally, the editing box below the main editing window (described in "Understanding the Edit Window" on Understanding the Edit Window) gives easy access to many characters with accents and diacritics, as well as the Greek, Cyrillic, and IPA alphabets. Just click one of these characters to insert it in an article.



[32] Byte Level Research publishes an annual globalization report card that regularly ranks Wikipedia second in the world after Google for "how successfully companies developed web sites for international markets." See http://bytelevel.com/news/reportcard2008.html.

[33] "Veni, Vidi, Wiki: Latin Isn't Dead On 'Vicipaedia'" (Wall Street Journal, September 29, 2007); see http://online.wsj.com/public/article/SB119103413731143589.html.