Preparing the data

For the purpose of classification, we will download the data from kaggle and store in an appropriate format. Sign up and log in to www.kaggle.com and go to https://www.kaggle.com/c/dogs-vs-cats/data. Download the train.zip and test1.zip files from that page. The train.zip file contains 25,000 images of pet data. We will use only a portion of the data to train a model. Readers with more computing power, such as a Graphics Processing Unit (GPU), can use more data than suggested. Run the following script to rearrange the images and create the necessary folders:

import os
import shutil

work_dir = '' # give your correct directory
image_names = sorted(os.listdir(os.path.join(work_dir, 'train')))


def copy_files(prefix_str, range_start, range_end, target_dir):
    image_paths = [os.path.join(work_dir, 'train', prefix_str + '.' + str(i) + '.jpg')
                   for i in range(range_start, range_end)]
    dest_dir = os.path.join(work_dir, 'data', target_dir, prefix_str)
    os.makedirs(dest_dir)
    for image_path in image_paths:
        shutil.copy(image_path, dest_dir)


copy_files('dog', 0, 1000, 'train')
copy_files('cat', 0, 1000, 'train')
copy_files('dog', 1000, 1400, 'test')
copy_files('cat', 1000, 1400, 'test')

For our experiments, we will use only 1,000 images of cats and dogs. So, copy images 0–999 from the downloaded folder to the newly created train folder under cats. Similarly, copy 1,000–1,400 to data/test/cat, 10–999 in train/dogs and 1,000–1,400 in data/test/dog so that we have 1,000 training examples for each class and 400 validation examples for each class.