One of AI’s most exciting areas is deep learning, a powerful subset of machine learning that has produced impressive results in computer vision and many other areas over the last few years. The availability of big data, significant processor power, faster Internet speeds and advancements in parallel computing hardware and software are making it possible for more organizations and individuals to pursue resource-intensive deep-learning solutions.
In the previous chapter, Scikit-learn enabled you to define machine-learning models conveniently with one statement. Deep learning models require more sophisticated setups, typically connecting multiple objects, called layers. We’ll build our deep learning models with Keras, which offers a friendly interface to Google’s TensorFlow—the most widely used deep-learning library.1 François Chollet of the Google Mind team developed Keras to make deep-learning capabilities more accessible. His book Deep Learning with Python is a must read.2 Google has thousands of TensorFlow and Keras projects underway internally and that number is growing quickly.3,4
Deep learning models are complex and require an extensive mathematical background to understand their inner workings. As we’ve done throughout the book, we’ll avoid heavy mathematics here, preferring English explanations.
Keras is to deep learning as Scikit-learn is to machine learning. Each encapsulates the sophisticated mathematics, so developers need only define, parameterize and manipulate objects. With Keras, you build your models from pre-existing components and quickly parameterize those components to your unique requirements. This is what we’ve been referring to as object-based programming throughout the book.
Machine learning and deep learning are empirical rather than theoretical fields. You’ll experiment with many models, tweaking them in various ways until you find the models that perform best for your applications. Keras facilitates such experimentation.
Deep learning works well when you have lots of data, but it also can be effective for smaller datasets when combined with techniques like transfer learning5,6 and data augmentation7 ,8. Transfer learning uses existing knowledge from a previously trained model as the foundation for a new model. Data augmentation adds data to a dataset by deriving new data from existing data. For example, in an image dataset, you might rotate the images left and right so the model can learn about objects in different orientations. In general, though, the more data you have, the better you’ll be able to train a deep learning model.
Deep learning can require significant processing power. Complex models trained on big-data datasets can take hours, days or even more to train. The models we present in this chapter can be trained in minutes to just less than an hour on computers with conventional CPUs. You’ll need only a reasonably current personal computer. We’ll discuss the special high-performance hardware called GPUs (Graphics Processing Units) and TPUs (Tensor Processing Units) developed by NVIDIA and Google to meet the extraordinary processing demands of edge-of-the-practice deep-learning applications.
Keras comes packaged with some popular datasets. You’ll work with two of these datasets in the chapter’s examples and several more in the exercises. You can find many Keras studies online for each of these datasets, including ones that take different approaches.
In the “Machine Learning” chapter, you worked with Scikit-learn’s Digits dataset, which contained 1797 handwritten-digit images that were selected from the much larger MNIST dataset (60,000 training images and 10,000 test images).9 In this chapter you’ll work with the full MNIST dataset. You’ll build a Keras convolutional neural network (CNN or convnet) model that will achieve high performance recognizing digit images in the test set. Convnets are especially appropriate for computer vision tasks, such as recognizing handwritten digits and characters or recognizing objects (including faces) in images and videos. You’ll also work with a Keras recurrent neural network. In that example, you’ll perform sentiment analysis using the IMDb Movie reviews dataset, in which the reviews in the training and testing sets are labeled as positive or negative.
Newer automated deep learning capabilities are making it even easier to build deep-learning solutions. These include Auto-Keras10 from Texas A&M University’s DATA Lab, Baidu’s EZDL11 and Google’s AutoML12. You’ll explore Auto-Keras in the exercises.
(Fill-In) _______ was developed by François Chollet of the Google Mind team as a friendly interface to Google’s TensorFlow.
Answer: Keras.
(Fill-In) _______ are appropriate for computer vision tasks, such as recognizing handwritten digits and characters or recognizing objects (including faces) in images and video.
Answer: Convnets.
Deep learning is being used in a wide range of applications, such as:
Game playing
Computer vision: Object recognition, pattern recognition, facial recognition
Self-driving cars
Robotics
Improving customer experiences
Chatbots
Diagnosing medical conditions
Google Search
Facial recognition
Automated image captioning and video closed captioning
Enhancing image resolution
Speech recognition
Language translation
Predicting election results
Predicting earthquakes and weather
Google Sunroof to determine whether you can put solar panels on your roof
Generative applications—Generating original images, processing existing images to look like a specified artist’s style, adding color to black-and-white images and video, creating music, creating text (books, poetry) and much more.
Check out these four deep-learning demos and search online for lots more, including practical applications like we mentioned in the preceding section:
DeepArt.io—Turn a photo into artwork by applying an art style to the photo. https:/
.
DeepWarp Demo—Analyzes a person’s photo and makes the person’s eyes move in different directions. https:/
.
Image-to-Image Demo—Translates a line drawing into a picture. https:/
.
Google Translate Mobile App (download from an app store to your smartphone)—Translate text in a photo to another language (e.g., take a photo of a sign or a restaurant menu in Spanish and translate the text to English).
Here are some resources you might find valuable as you study deep learning:
To get your questions answered, go to the Keras team’s slack channel at https:/
.
For articles and tutorials, visit https:/
.
The Keras documentation is at http:/
.
If you’re looking for term projects, directed study projects, capstone course projects or thesis topics, visit arXiv (pronounced “archive,” where the X represents the Greek letter “chi”) at https:/
. People post their research papers here in parallel with going through peer review for formal publication, hoping for fast feedback. So, this site gives you access to extremely current research.