The neuron activations can be amplified at some layer in the network rather than synthesizing the image. This concept of amplifying the original image to see the effect of features is called DeepDream. The steps for creating the DeepDream are:
- Take an image and pick a layer from CNN.
- Take the activations at a particular layer.
- Modify the gradient such that the gradient and activations are equal.
- Compute the gradients of the image and backpropagate.
- The image has to be jittered and normalized using regularization.
- The pixel values should be clipped.
- Multi-scale processing of the image is done for the effect of fractal.
Let's start by importing the relevant packages:
import os
import numpy as np
import PIL.Image
import urllib.request
from tensorflow.python.platform import gfile
import zipfile
The inception model is pre-trained on the Imagenet dataset and the model files provided by Google. We can download that model and use it for this example. The ZIP archive of the model files are downloaded and extracted in a folder, as shown here:
model_url = 'https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip'
file_name = model_url.split('/')[-1]
file_path = os.path.join(work_dir, file_name)
if not os.path.exists(file_path):
file_path, _ = urllib.request.urlretrieve(model_url, file_path)
zip_handle = zipfile.ZipFile(file_path, 'r')
zip_handle.extractall(work_dir)
zip_handle.close()
These commands should have created three new files in the working directory. This pre-trained model can be loaded into the session, as shown here:
graph = tf.Graph()
session = tf.InteractiveSession(graph=graph)
model_path = os.path.join(work_dir, 'tensorflow_inception_graph.pb')
with gfile.FastGFile(model_path, 'rb') as f:
graph_defnition = tf.GraphDef()
graph_defnition.ParseFromString(f.read())
A session is started with the initialization of a graph. Then the graph definition of the model downloaded is loaded into the memory. The ImageNet mean has to be subtracted from the input as shown next, as a preprocessing step. The preprocessed image is then fed to the graph as shown:
input_placeholder = tf.placeholder(np.float32, name='input')
imagenet_mean_value = 117.0
preprocessed_input = tf.expand_dims(input_placeholder-imagenet_mean_value, 0)
tf.import_graph_def(graph_defnition, {'input': preprocessed_input})
Now the session and graph are ready for inference. A resize_image function will be required with bilinear interpolation. A resize function method can be added that resizes the image with a TensorFlow session, as shown here:
def resize_image(image, size):
resize_placeholder = tf.placeholder(tf.float32)
resize_placeholder_expanded = tf.expand_dims(resize_placeholder, 0)
resized_image = tf.image.resize_bilinear(resize_placeholder_expanded, size)[0, :, :, :]
return session.run(resized_image, feed_dict={resize_placeholder: image})
An image from the working directory can be loaded into the memory and converted to float value, as shown here:
image_name = 'mountain.jpg'
image = PIL.Image.open(image_name)
image = np.float32(image)
The image that is loaded is shown here, for your reference:
The number of octaves, size, and scale of the scale space are defined here:
no_octave = 4
scale = 1.4
window_size = 51
These values work well for the example shown here and hence, require tuning for other images based on their size. A layer can be selected for dreaming and the average mean of that layer will be the objective function, as shown here:
score = tf.reduce_mean(objective_fn)
gradients = tf.gradients(score, input_placeholder)[0]
The gradient of the images is computed for optimization. The octave images can be computed by resizing the image to various scales and finding the difference, as shown:
octave_images = []
for i in range(no_octave - 1):
image_height_width = image.shape[:2]
scaled_image = resize_image(image, np.int32(np.float32(image_height_width) / scale))
image_difference = image - resize_image(scaled_image, image_height_width)
image = scaled_image
octave_images.append(image_difference)
Now the optimization can be run using all the octave images. The window is slid across the image, computing the gradients activation to create the dream, as shown here:
for octave_idx in range(no_octave):
if octave_idx > 0:
image_difference = octave_images[-octave_idx]
image = resize_image(image, image_difference.shape[:2]) + image_difference
for i in range(10):
image_heigth, image_width = image.shape[:2]
sx, sy = np.random.randint(window_size, size=2)
shifted_image = np.roll(np.roll(image, sx, 1), sy, 0)
gradient_values = np.zeros_like(image)
for y in range(0, max(image_heigth - window_size // 2, window_size), window_size):
for x in range(0, max(image_width - window_size // 2, window_size), window_size):
sub = shifted_image[y:y + window_size, x:x + window_size]
gradient_windows = session.run(gradients, {input_placeholder: sub})
gradient_values[y:y + window_size, x:x + window_size] = gradient_windows
gradient_windows = np.roll(np.roll(gradient_values, -sx, 1), -sy, 0)
image += gradient_windows * (1.5 / (np.abs(gradient_windows).mean() + 1e-7))
Now the optimization to create the DeepDream is completed and can be saved as shown, by clipping the values:
image /= 255.0
image = np.uint8(np.clip(image, 0, 1) * 255)
PIL.Image.fromarray(image).save('dream_' + image_name, 'jpeg')
In this section, we have seen the procedure to create the DeepDream. The result is shown here:
As we can see, dog slugs are activated everywhere. You can try various other layers and see the results. These results can be used for artistic purposes. Similarly, other layers can be activated to produce different artifacts. In the next section, we will see some adversarial examples that can fool deep learning models.