14.1 (Try It: Watson Speech to Text) Use the microphone on your computer to record yourself speaking a paragraph of text. Upload that audio to the Watson Speech to Text demo: https:speech-to-text-demo.ng.bluemix.net/
. Check the transcription results to see whether there are any words Watson has trouble understanding.
14.2 (Try It: Watson Speech to Text—Detecting Separate Speakers) With a friend’s permission and using the microphone on your computer, record a conversation between you and a friend, then upload that audio file to the Watson Speech to Text demo at https:speech-to-text-demo.ng.bluemix.net/
. Enable the option to detect multiple speakers. As the demo transcribes your voices to text, check whether Watson accurately distinguishes between your voices and transcribes the text accordingly.
14.3 (Visual Object Recognition) Investigate the Visual Recognition service and use its demo to locate various items in your photos and your friends’ photos.
14.4 (Language Translator App Enhancement) In our Traveler’s Assistant Translator app’s Steps 1 and 6, we displayed only English text prompting the user to press Enter and record. Display the instructions in both English and Spanish.
14.5 (Language Translator App Enhancement) The Text to Speech service supports multiple voices for some languages. For example, there are four English voices and four Spanish voices. Experiment with the different voices. For the names of the voices, see
https:/ / www.ibm.com/ watson/ developercloud/ text-to-speech/ api/ v1/ python.html?python#get-voice
14.6 (Language Translator App Enhancement) Our Traveler’s Assistant Translator app supports only English and Spanish. Investigate the languages Watson currently supports in common for the Speech to Text, Language Translator and Text to Speech services. Pick one and convert our app to use that language rather than Spanish.
14.7 (United Nations Dilemma: Inter-Language Translation) Inter-language translation is one of the most challenging artificial intelligence and natural language processing problems. Literally hundreds of languages are spoken at the United Nations. As of this writing, the Watson Language Translator service will allow you to translate an English sentence to Spanish, then the Spanish to French, then the French to German, then the German to Italian, then the Italian back to English. You may be surprised with how the final result differs from the original. For a list of all the inter-language translations that Watson allows, see
https:/ / console.bluemix.net/ catalog/ services/ language-translator
Use the Watson Language Translator service to build a Python application that performs the preceding series of translations, showing the text in each language along the way and the final English result. This will help you appreciate the challenge of having people from many countries understand one another.
14.8 (Python Pizza Parlor) Use Watson Text to Speech and Speech to Text services to communicate verbally with a person ordering a pizza. Your app should welcome the person and ask them what size pizza they’d like (small or large). Then ask the person if they’d like pepperoni (yes or no). Then ask if they’d like mushrooms (yes or no). The user responds by speaking each answer. After processing the user’s responses, the app should summarize the order verbally and thank the customer for their order. For an extra challenge, consider researching and using the Watson Assistant service to build a chatbot to solve this problem.
14.9 (Language Translator: Language Identification) Investigate the Language Translator service’s ability to detect the language of text. Then write an app that will send text strings in a variety of languages to the Language Translator service and see if it identifies the source languages correctly. See
https:console.bluemix.net/catalog/services/language-translator
for a list of the dozens of supported languages.14.10 (Watson Internet of Things Platform) Watson also provides the Watson Internet of Things (IoT) Platform for analyzing live data streams from devices in the Internet of Things, such as temperature sensors, motion sensors and more. To get a sense of a live data stream, you can follow the instructions at
https:/ / discover-iot.eu-gb.mybluemix.net/ #/ play
to connect your smartphone to the demo, then watch on your computer and phone screens as live sensor data displays. On your computer screen, a phone image moves dynamically to show the your phone’s orientation as you move and rotate it in your hand.
14.11 (Pig Latin Translator App) Research the rules for translating English-language words into pig Latin. Read a sentence from the user. Then, encode the sentence into pig Latin, display the pig Latin text and use speech synthesis to speak 'The
sentence
insertOriginalSentenceHere in
pig
Latin
is
insertPigLatinSentenceHere'
(replace the italicized English text with the corresponding pig Latin text). For simplicity, assume that the English sentence consists of words separated by blanks, there are no punctuation marks and all words have two or more letters.
14.12 (Random Story Writer App) Write a script that uses random-number generation to create, display and speak sentences. Use four arrays of strings called article
, noun
, verb
and preposition
. Create a sentence by selecting a word at random from each array in the following order: article
, noun
, verb
, preposition
, article
, noun
. As each word is picked, concatenate it to the previous words in the sentence. Spaces should separate the words. When a sentence is displayed, it should start with a capital letter and end with a period. Allow the script to produce a short story consisting of several sentences. Use Text to Speech to read the story aloud to the user.
14.13 (Eyesight Tester App) You’ve probably had your eyesight tested. In an eye exam, you’re asked to cover one eye, then read out loud the letters from an eyesight chart called a Snellen chart. The letters are arranged in 11 rows and include only the letters C, D, E, F, L, N, O, P, T, Z. The first row has one letter in a huge font. As you move down the page, the number of letters in each row increases and the font size decreases, ending with a row of 11 letters in a tiny font. Your ability to read the letters accurately measures your visual acuity. Create an eyesight testing chart similar to the Snellen chart used by medical professionals
https:en.wikipedia.org/wiki/Snellen_chart
).
The app should prompt the user to say each letter. Then use speech synthesis to determine if the user said the correct letter. At the end of the test, display—and speak—'Your
vision
is
20/20'
or whatever the appropriate value is for the user’s visual acuity.
14.14 (Project: Speech Synthesis Markup Language) Investigate SSML (Speech Synthesis Markup Language), then use it to mark up a paragraph of text to see how the SSML you specify affects Watson’s voices. Experiment with inflection, cadence, pitch and more. Try out your text with various voices in the Watson Text to Speech demo at:
https://text-to-speech-demo.ng.bluemix.net/
You can learn more about SSML at https:www.w3.org/TR/speech-synthesis/
.
14.15 (Project: Text to Speech and SSML—Singing Happy Birthday) Use Watson Text to Speech and SSML to have Watson sing Happy Birthday. Let users enter their names.
14.16 (Enhanced Tortoise and Hare) Add speech-synthesis capabilities to your solution to the simulation of the tortoise-and-hare race in Exercise4.12 . Use speech to call the race as it proceeds, dropping in phrases like 'On
your
mark.
Get
set.
Go!'
, "And
they're
off!"
, 'The
tortoise
takes
the
lead!'
, 'The
hare
is
taking
a
snooze'
, etc. At the end of the race, announce the winner.
14.17 (Project: Enhanced Tortoise and Hare with SSML) Use SSML in your solution to Exercise14.16 to make the speech sound like a sportscaster announcing a race on TV.
14.18 (Challenge Project: Language Translator App—Supporting Any Length Audio) Our Traveler’s Assistant Translator app allows each speaker to record for five seconds. Investigate how to use PyAudio (which is not a Watson capability) to detect when someone starts speaking and stops speaking so you can record audio of any length. Caution: The code for doing this is complex.
14.19 (Project: Building a Chatbot with Watson Assistant) Investigate the Watson Assistant service. Next, go to https:console.bluemix.net/developer/watson/dashboard
and try
Build a chatbot
. After you click
Create
, follow the steps provided to build your chatbot. Be sure to follow the
Getting started tutorial
at
https://console.bluemix.net/docs/services/conversation/getting-started.html#getting-started-tutorial
14.20 (For the Entrepreneur: Bot Applications) Research common bot applications. Indicate how they can improve things like call center operations. For example, you would eliminate time spent on the phone waiting for a human to become available. The bot can ask if the caller is satisfied with the answer, then the caller can hang up or the bot can route the caller to a human. Bots can accumulate massive expertise over time. If you’re entrepreneurial, you could develop sophisticated bots for organizations to purchase. Opportunities abound in fields, such as health care, answering Social Security and Medicare questions, helping travelers plan itineraries, and many more.
14.21 (Project: Metric Conversion App) Write an app that uses speech recognition and speech synthesis to assist users with metric conversions. Allow the user to specify the names of the units (e.g., centimeters, liters, grams, for the metric system and inches, quarts, pounds, for the English system) and should respond to simple questions, such as
'How many inches are in 2 meters?'
'How many liters are in 10 quarts?'
Your program should recognize invalid conversions. For example, the question
'How many feet are in 5 kilograms?'
is not a meaningful question because 'feet
' is a unit of length whereas 'kilograms
' is a unit of mass. Use speech synthesis to speak the result and display the result in text. If the question is invalid, the app should speak, 'That
is
an
invalid
conversion.'
and display the same message as text.
14.22 (Accessibility Challenge Project: Voice-Driven Text Editor) The speech synthesis and recognition technologies you learned in this chapter are particularly useful for implementing apps for people who cannot use their hands. Create a simple text editor app that allows the user to speak some text, then edit the text verbally. Provide basic editing features such as insert and delete via voice commands.
14.23 (Project: Watson Sentiment Analysis) In the “Natural Language Processing” chapter and in the Data Mining Twitter chapter, we performed sentiment analysis. Run some of the tweets whose sentiment you analyzed in the Twitter chapter through the Watson Natural Language Understanding service’s sentiment analysis capability. Compare the analysis results.