Deep Learning for Computer Vision · Expert Techniques to Train Advanced Neural Networks Using TensorFlow and Keras by Shanmugamani, Rajalingappaa -- Read -- Imperial Library of Trantor

The AlexNet model The VGG-16 model The Google Inception-V3 model The Microsoft ResNet-50 model The SqueezeNet model Spatial transformer networks The DenseNet model

Training a model for cats versus dogs

Preparing the data Benchmarking with simple CNN Augmenting the dataset

Augmentation techniques 

Transfer learning or fine-tuning of a model

Training on bottleneck features

Fine-tuning several layers in deep learning

Developing real-world applications

Choosing the right model Tackling the underfitting and overfitting scenarios Gender and age detection from face Fine-tuning apparel models  Brand safety

Summary

Image Retrieval

Understanding visual features

Visualizing activation of deep learning models Embedding visualization

Guided backpropagation

The DeepDream Adversarial examples

Model inference

Exporting a model Serving the trained model 

Content-based image retrieval

Building the retrieval pipeline

Extracting bottleneck features for an image Computing similarity between query image and target database

Efficient retrieval

Matching faster using approximate nearest neighbour

Advantages of ANNOY

Autoencoders of raw images

Denoising using autoencoders

Summary

Object Detection

Detecting objects in an image Exploring the datasets

ImageNet dataset PASCAL VOC challenge COCO object detection challenge Evaluating datasets using metrics

Intersection over Union The mean average precision

Localizing algorithms 

Localizing objects using sliding windows

The scale-space concept Training a fully connected layer as a convolution layer Convolution implementation of sliding window

Thinking about localization as a regression problem

Applying regression to other problems Combining regression with the sliding window

Detecting objects

Regions of the convolutional neural network (R-CNN) Fast R-CNN Faster R-CNN Single shot multi-box detector

Object detection API

Installation and setup Pre-trained models Re-training object detection models

Data preparation for the Pet dataset Object detection training pipeline Training the model Monitoring loss and accuracy using TensorBoard

Training a pedestrian detection for a self-driving car

The YOLO object detection algorithm  Summary

Semantic Segmentation

Predicting pixels

Diagnosing medical images Understanding the earth from satellite imagery Enabling robots to see

Datasets Algorithms for semantic segmentation

The Fully Convolutional Network The SegNet architecture

Upsampling the layers by pooling Sampling the layers by convolution Skipping connections for better training

Dilated convolutions DeepLab RefiNet PSPnet Large kernel matters DeepLab v3

Ultra-nerve segmentation Segmenting satellite images

Modeling FCN for segmentation

Segmenting instances Summary

Similarity Learning

Algorithms for similarity learning

Siamese networks

Contrastive loss

FaceNet

Triplet loss

The DeepNet model DeepRank Visual recommendation systems

Human face analysis

Face detection Face landmarks and attributes

The Multi-Task Facial Landmark (MTFL) dataset The Kaggle keypoint dataset The Multi-Attribute Facial Landmark (MAFL) dataset Learning the facial key points

Face recognition

The labeled faces in the wild (LFW) dataset The YouTube faces dataset The CelebFaces Attributes dataset (CelebA)  CASIA web face database The VGGFace2 dataset Computing the similarity between faces Finding the optimum threshold

Face clustering 

Summary

Image Captioning

Understanding the problem and datasets Understanding natural language processing for image captioning

Expressing words in vector form Converting words to vectors Training an embedding

Approaches for image captioning and related problems

Using a condition random field for linking image and text Using RNN on CNN features to generate captions Creating captions using image ranking Retrieving captions from images and images from captions Dense captioning  Using RNN for captioning Using multimodal metric space Using attention network for captioning Knowing when to look

Implementing attention-based image captioning Summary

Generative Models

Applications of generative models

Artistic style transfer Predicting the next frame in a video  Super-resolution of images Interactive image generation Image to image translation Text to image generation Inpainting Blending Transforming attributes Creating training data Creating new animation characters 3D models from photos

Neural artistic style transfer

Content loss Style loss using the Gram matrix Style transfer

Generative Adversarial Networks

Vanilla GAN Conditional GAN Adversarial loss Image translation InfoGAN Drawbacks of GAN

Visual dialogue model

Algorithm for VDM

Generator Discriminator

Summary

Video Classification

Understanding and classifying videos 

Exploring video classification datasets

UCF101 YouTube-8M Other datasets

Splitting videos into frames Approaches for classifying videos

Fusing parallel CNN for video classification Classifying videos over long periods Streaming two CNN's for action recognition Using 3D convolution for temporal learning Using trajectory for classification Multi-modal fusion Attending regions for classification

Extending image-based approaches to videos

Regressing the human pose

Tracking facial landmarks

Segmenting videos Captioning videos Generating videos

Summary

Deployment

Performance of models

Quantizing the models MobileNets

Deployment in the cloud

AWS Google Cloud Platform

Deployment of models in devices

Jetson TX2 Android iPhone

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

← Prev
Back
Next →

← Prev
Back
Next →