String Search
Find a table of color names and their corresponding RGB values. Create an interaction in which a square becomes colored when a user types a known color name. (Is your program case-insensitive?)
Nonsense Words
Find some lists of common English prefixes, word roots, and suffixes. Select a random item from each list and combine them in a simple syntax (prefix+root+suffix) to generate plausible nonsense words. What might these words mean?
Letter Frequency
Write a program to calculate the frequencies of the letters in a provided text. (Be careful to make your program “case-insensitive.”) Write code to generate a visualization (such as a bar chart or pie chart) of the letters’ frequencies.
Letter-Pair Frequency
Write a program to calculate the frequencies of letter pairs (character 2-grams, such as “aa,” “ab,” “ac”) in a large text source. Plot the frequencies in a 26x26 matrix.
Average Word Length
Write a program that calculates the average word length of a provided text. This is a useful approximation of a text's “reading level.” Run your program on several different source materials.
Sorting Words
Load a document and display its words (a) sorted alphabetically, (b) sorted by their length, and (c) sorted according to their frequency in the text.
Cut-Up Machine
In the Dada Manifesto, Tristan Tzara describes using a newspaper, scissors, and some gentle shaking to generate irrational poetry. Do the same with code. Write a program that randomizes the lines or sentences of a newspaper article to make a Dada-style poem.
Bigram Calculator
Write a program to calculate the frequency of all bigrams (word-pairs) in a document. Advanced students: develop a program that judges the similarity of two text files based on how many bigrams they have in common.
Dammit Jim
Find a list of occupations. Use these in a generative grammar that produces sentences in this format: “Dammit, Jim, I'm an X, not a Y!” (popularized by the sci-fi TV show Star Trek). Be sure to write “an X” if X begins with a vowel and “a X” if it begins with a consonant.1
Knock-Knock Joke Generator
Create a program that generates knock-knock jokes. At a miminum, your program should select a random word as a response to “Who's there?” and add to this word to create the final line of the joke. Generate ten jokes.
Pig Latin Translator
Create a program that translates provided text into Pig Latin. In this playful scheme, the initial consonant (or consonant cluster) of each word is transferred to the end of that word, after which the syllable “ay” is added.
Argots and Language Games
Investigate argots, secret languages, and word games like Ubbi Dubbi, Tutnese, Pirate English, and Dizzouble Dizzutch. Select one and write a program that translates a provided text into it.
Noun Swizzler
Load a text, and replace each noun with a randomly selected noun taken from a second text. You may need to use a “part-of-speech tagger” to identify the nouns. In your substitutions, try to match the use of plural and proper nouns in the first text.
Rhyming Couplets
Select and load a large expository text. Create a program that finds rhyming couplets within it, in order to make new poems. You may need to use an additional library (such as RiTa) to tell you what the words sound like.2
Haiku Finder
Write a program that automatically discovers “inadvertent haikus”: sentences in a chosen text whose words happen to fall in groups of five, seven, and five syllables. A basic solution will discover haikus with awkward breaks; add heuristics to improve the quality of your results.
Markov Text Generator
In a Markov chain, a dataset of letter-pair, bigram, or n-gram frequencies is used as a “probability transition matrix” to synthesize new text that statistically resembles the dataset's source. Build a Markov generator using the data you collected earlier.3
Keyword Extraction with TF-IDF
Obtain a collection of related documents, such as poems, recipes, or obituaries. Write a program that uses the TF-IDF (“Term Frequency - Inverse Document Frequency”) algorithm to determine the keywords that best characterize each document.4
Limerick Generator
Limerick poems have five lines in an AABBA rhyming pattern. The building block of these lines is the anapest: a foot of verse consisting of three syllables, the third of which is accented: da-da-DA. Lines 1, 2, and 5 consist of three anapests; they end with a similar phoneme in order to create the rhyme. Lines 3 and 4 also rhyme with each other, but are shorter, consisting of two anapests each. Write a program to generate limericks. Use a code library such as RiTa to help evaluate your words’ rhymes, syllables, and stress patterns.5