Find a table of color names and their corresponding RGB values. Create an interaction in which a square becomes colored when a user types a known color name. (Is your program case-insensitive?)
Find some lists of common English prefixes, word roots, and suffixes. Select a random item from each list and combine them in a simple syntax (prefix+root+suffix) to generate plausible nonsense words. What might these words mean?
Write a program to calculate the frequencies of the letters in a provided text. (Be careful to make your program “case-insensitive.”) Write code to generate a visualization (such as a bar chart or pie chart) of the letters’ frequencies.
Write a program to calculate the frequencies of letter pairs (character 2-grams, such as “aa,” “ab,” “ac”) in a large text source. Plot the frequencies in a 26x26 matrix.
Write a program that calculates the average word length of a provided text. This is a useful approximation of a text's “reading level.” Run your program on several different source materials.
Load a document and display its words (a) sorted alphabetically, (b) sorted by their length, and (c) sorted according to their frequency in the text.
In the Dada Manifesto, Tristan Tzara describes using a newspaper, scissors, and some gentle shaking to generate irrational poetry. Do the same with code. Write a program that randomizes the lines or sentences of a newspaper article to make a Dada-style poem.
Write a program to calculate the frequency of all bigrams (word-pairs) in a document. Advanced students: develop a program that judges the similarity of two text files based on how many bigrams they have in common.
Find a list of occupations. Use these in a generative grammar that produces sentences in this format: “Dammit, Jim, I'm an X, not a Y!” (popularized by the sci-fi TV show Star Trek). Be sure to write “an X” if X begins with a vowel and “a X” if it begins with a consonant.1
Create a program that generates knock-knock jokes. At a miminum, your program should select a random word as a response to “Who's there?” and add to this word to create the final line of the joke. Generate ten jokes.
Create a program that translates provided text into Pig Latin. In this playful scheme, the initial consonant (or consonant cluster) of each word is transferred to the end of that word, after which the syllable “ay” is added.
Investigate argots, secret languages, and word games like Ubbi Dubbi, Tutnese, Pirate English, and Dizzouble Dizzutch. Select one and write a program that translates a provided text into it.
Load a text, and replace each noun with a randomly selected noun taken from a second text. You may need to use a “part-of-speech tagger” to identify the nouns. In your substitutions, try to match the use of plural and proper nouns in the first text.
Select and load a large expository text. Create a program that finds rhyming couplets within it, in order to make new poems. You may need to use an additional library (such as RiTa) to tell you what the words sound like.2
Write a program that automatically discovers “inadvertent haikus”: sentences in a chosen text whose words happen to fall in groups of five, seven, and five syllables. A basic solution will discover haikus with awkward breaks; add heuristics to improve the quality of your results.
In a Markov chain, a dataset of letter-pair, bigram, or n-gram frequencies is used as a “probability transition matrix” to synthesize new text that statistically resembles the dataset's source. Build a Markov generator using the data you collected earlier.3
Keyword Extraction with TF-IDF
Obtain a collection of related documents, such as poems, recipes, or obituaries. Write a program that uses the TF-IDF (“Term Frequency - Inverse Document Frequency”) algorithm to determine the keywords that best characterize each document.4
Limerick poems have five lines in an AABBA rhyming pattern. The building block of these lines is the anapest: a foot of verse consisting of three syllables, the third of which is accented: da-da-DA. Lines 1, 2, and 5 consist of three anapests; they end with a similar phoneme in order to create the rhyme. Lines 3 and 4 also rhyme with each other, but are shorter, consisting of two anapests each. Write a program to generate limericks. Use a code library such as RiTa to help evaluate your words’ rhymes, syllables, and stress patterns.5