Adding custom made Images using Data Loaders for Transfer Learning In five steps

Tsaku Nelson
3 min readJul 7, 2018

Hello everyone, I am a Data Science enthusiast, striving at learning by all means, every method used to gain insights from current data in any sector and produce quality results, for the betterment of society in time.

I started learning Pytorch a week ago and found it important writing this blog, for people having tough times loading custom made data (Images) into a pretrained Neural Network. This blog post solves the problem, using DataLoader api from Pytorch in four main steps.

1. Setting up your directory

- Save your images into two separate folders, one for training (../traindata) and the other for testing (../validationdata).

- In each directory create additional folders, each corresponding to the name of the image class. If you have 3 classes, cars, airplanes and boats for instance, then you should have 3 folders named after these classing in both the training and validation parent folder.

- Populate each class folder, with its respecting image counterpart.

Above is just a directory structure you should obtain, with cars, airplanes and boats as train/validation classes.

2. Import required Libraries

Two main libraries are necessary here:

  • “DataLoaders” from “torch.utils.data”
  • “transforms” and “datasets” from “torchVision

3. Transforming the sample data classes using transforms

The class transforms has the constructor Compose, which simply takes a number of parameters, describing the types of transformations you would like to have on you images. Such transformations could be rotations, translations, center cropping, normalizing the images, and of course the most important would be “transforms.ToTensor()” which is a fundamental function every dataset must have, given that Pytorch accepts data in Tensors only. Below is a sample transformation for training our classes with a pretrained model in pytorch.

Setting up the image transformation in Python

4. Creating the dataset with the applied transformations

This is quite straight forward. Just create a dataset instance, and call its ImageFolder method, with the above transformations as second parameter, out of which the first, which should be:

- The path to the directory containing the data

train_data = dataset.ImageFolder(“../train”, train_transform = transform)

5. Creating each data loader from its respective data set

In this last step, create a Data Loader instance, and depending on how you want your data to be loaded for training and testing, define the following parameters:

- The above created datasets ofcourse (both train and validation data)

- Batch- size: Specifies the number of image samples to be collected per iteration.

- Shuffle: A boolean, stating where the data should be shuffled or net. This is usualy set true for training purposes, in order to improve classification metrics:

train_loader = torch.utils.data.DataLoader(train_data, batch_size = 30, shuffle = True)

The above code shuffles the dataset and collects 30 images at a time for training.

The validation_Loader is vering similar, just that it has its own dataset and shuffle is set to false:

validation_loader = torch.utils.data.DataLoader(validation_data, batch_size = 30, shuffle = False)

Carefully applying the above algorithm should yield a similar result below:

Complete set up of an Image directory for training/test with Pytorch

NB: The almost the same process has been applied to both testing and training with some slight modifications, in the transformations. This is quite logical in that, we want our model to produce go metrics while keeping a generalized format to avoid overfiting. The later, is best done in the training process.

That said, we are now ready to train and validate our new custom made images into a pretrained Neural Network with Pytorch. Follow up for the next blog post, on tranfer Learning with Pytorch.

Good Luck !

--

--