APPENDIX

Word Events

LANGUAGE DATA

Data about word structure was acquired from 18 languages: [Indo-European] (1) English, (2) German, (3) Spanish, (4) Bengali, (5) Bosnian; [Altaic] (6) Turkish; [American] (7) Inukitut, (8) Taino, (9) Yucatec Maya; (African) (10) Lango, (11) Somali, (12) Wolof, (13) Zulu, (14) Haya; [Austronesian] (15) Fijian, (16) Malagasy; [Dravidian] (17) Tamil; [East Asian] (18) Japanese. In each case we acquired a sample of common words (an average of 937 (sterr = 134) words per language); our analysis confined itself to those words having three or fewer non-sonorants (an average of 775 (sterr = 103) words per language). In many cases, data were obtained from transliterated dictionaries, and the phonological interpretation of the transliteration (for which we cared only about whether phonemes were plosives, fricatives or sonorants) obtained from a variety of sources (some included in Table 1). Each word in the sample was measured by converting each plosive to a ‘b’, each fricative to an ‘s’, and any adjacent sequence of sonorants to an ‘a’. Sonorants included vowels, as well as sonorant consonants (like y, w, l, r, m, n, and ng). Also, words beginning with a vowel typically begin with a glottal consonant, which was treated as a plosive, and coded as starting with a ‘b’ before the ‘a’ of the vowel. Affricates (like “ch” and “j”) were coded as ‘bs’. Table 2 shows the counts for each structure type within each sampled language. For words beginning with a sonorant, only those having two or fewer non-sonorants were included; this is because, as discussed in the main text, these sonorant-start words are predicted as cases where a ring was initiated with an inaudible hit. As a test of the methodology for determining word structure type from words, a naïve observer was asked to code the 863 words with three or fewer non-sonorants for our sample of German; when plotted against the frequency counts of the structure types as coded by the first author, the best-fit equation on a log-log plot was y = 0.95x^0.92, or nearly the identity (y = x), with a correlation R² = 0.88.

Table 1. Languages from which samples of word structure types were acquired. Citations are given to the word list, and to at least one source for phonological information used in categorizing orthographic elements as plosives, fricatives, or sonorants.

	Family	Language	Line 1: Source of common word list Line 2: Phonological information
1	Indo-European	English	Am. Natl Corpus, http://americannationalcorpus.org/SecondRelease/data/ANC-spoken-count.txt
2	Indo-European	German	http://www.wortschatz.uni-leipzig.de/Papers/top1000de.txt http://en.wikipedia.org/wiki/German_orthography
3	Indo-European	Spanish	http://en.wiktionary.org/wiki/Wiktionary:Frequency_lists/Spanish1000 http://en.wikipedia.org/wiki/Spanish_alphabet
4	Indo-European	Bengali	http://www.websters-online-dictionary.org/translation/Bengali+%2528Transliterated%2529/ http://www.prabasi.org/Literary/ComposeArticle.html
5	Indo-European	Bosnian	http://www.websters-online-dictionary.org/translation/Bosnian/ http://en.wikipedia.org/wiki/Bosnian_language
6	Altaic	Turkish	http://www.turkishlanguage.co.uk/freqvocab.htm http://www.omniglot.com/writing/turkish.htm
7	American	Inukitut	http://www.websters-online-dictionary.org/translation/Inuktitut+%2528Transliterated%2529/ http://en.wikipedia.org/wiki/Inuit_phonology, http://www.rrsss17.gouv.qc.ca/en/nunavik/langue.aspx
8	American	Taino	http://www.websters-online-dictionary.org/translation/Taino/ http://en.wikipedia.org/wiki/Ta%C3%ADno
9	American	Yucatec Maya	http://www.websters-online-dictionary.org/translation/Yucatec/ http://en.wikipedia.org/wiki/Yucatec_Maya
10	African	Lango	http://www.websters-online-dictionary.org/definition/lango-english/ http://sumale.vjf.cnrs.fr/phono/AfficheTableauOrtho2N.php?choixLangue=dholuo
11	African	Somali	http://www.websters-online-dictionary.org/translation/Somali/ http://en.wikipedia.org/wiki/Somali_alphabet, http://en.wikipedia.org/wiki/Somali_phonology
12	African	Wolof	http://www.websters-online-dictionary.org/translation/Wolof/ http://www.omniglot.com/writing/wolof.htm, http://en.wikipedia.org/wiki/Wolof_language
13	African	Zulu	http://www.websters-online-dictionary.org/definition/Zulu-english/ http://isizulu.net/p11n/
14	African	Haya	http://www.websters-online-dictionary.org/translation/Haya/ http://en.wikipedia.org/wiki/Haya_language
15	Austronesian	Fijian	http://www.websters-online-dictionary.org/translation/Fijian/ http://en.wikipedia.org/wiki/Fijian_language
16	Austronesian	Malagasy	http://www.websters-online-dictionary.org/definition/Malagasy-english/ http://en.wikipedia.org/wiki/Malagasy_language
17	Dravidian	Tamil	http://www.websters-online-dictionary.org/translation/Tamil+%2528Transliterated%2529/+ http://www.omniglot.com/writing/tamil.htm, http://portal.unesco.org/culture/en/files/38245/12265762813tamil_en.pdf/tamil_en.pdf
18	East Asian	Japanese	http://www.jpf.org.uk/language/download/VocListAAug07.pdf http://en.wikipedia.org/wiki/Japanese_phonology

VIDEO DATA

Our hypothesis is that it is the physical events among macroscopic solid objects that principally drives the competencies of our auditory system, and thus coders were trained to measure sequences of hits and slides in the physical events found in videos. To avoid any potential auditory bias to hear speech-like patterns among natural event sounds, measurements were made visually (i.e., with the video’s audio muted). Measurements were made from several categories of video, each chosen because of the likelihood of finding “typical” kinds of solid-object physical events. Categories were as shown below, followed by links to the videos (and their lengths).

Cooking (23 minutes)

http://www.youtube.com/watch?v=6s__hRrQZ3E (9:29)
http://www.youtube.com/watch?v=Y36zINLldyQ (3:49)
http://www.youtube.com/watch?v=Enytl9Epfcs&feature=related (9:50)
Assembly instructions (17 minutes)
http://www.youtube.com/watch?v=fOofJFyu9s8 (1:37)
http://www.youtube.com/watch?v=Y-oPmSCIQPw (0:48)
http://www.youtube.com/watch?v=Z_8otugkqxM (2:31)
http://www.youtube.com/watch?v=hsd7vne65nA (4:55)
http://www.youtube.com/watch?v=Dd8Y5prcCos (7:39)
Children playing with toys (7 minutes)
http://www.youtube.com/watch?v=yRPoBXZcx_o (1:56)
http://www.youtube.com/watch?v=_1-TbrU8W0M (1:17)
http://www.youtube.com/watch?v=4gYMerbfYpM (1:10)
http://www.youtube.com/watch?v=O28i03T82EE&NR=1 (0:46)
http://www.youtube.com/watch?v=BSbV4U62Mg0&feature=related (1:45)
Acrobatics (8 minutes)
http://www.youtube.com/watch?v=RKoKtHzrTEw (2:22)
http://www.youtube.com/watch?v=KXpbCQ6kIVQ&feature=related (1:59)
http://www.youtube.com/watch?v=VY9g7koP8yQ (3:41)
Family gatherings (11 minutes)
http://www.youtube.com/watch?v=H11dO6tr3v4 (2:44)
http://www.youtube.com/watch?v=m_q6QRD4hLU (8:17)

These amount to 67 minutes of video in total. The average (across the three viewers) total number of events with three or fewer physical interactions (i.e., hits or slides) among these videos was 504.7. The correlations between the relative frequency distributions for the three viewers were R² = 0.51, R² = 0.63, R² = 0.48. These three coders also measured from the same videos a second time, this time with the sound present; the average distribution for vision only was highly correlated with the average distribution for audition-and-vision (R² = 0.857). Also, as part of the training for coding, a “ground truth” auditory file was created by the first author with sample physical event types, and the two coders measured, via audition only, the distribution, and had correlations of R² = 0.63 and R² = 0.64 with the ground truth source.