Working with less data - Image classification with very few images

Every ML practitioner has attempted Kaggle's Dogs v/s Cats competition which pretty much is a solved problem thanks to methods like Transfer Learning on top of architectures like ResNet. But this dataset can be used to show some cool techniques by self imposing certain constraints, like using only a fraction of the data to obtain the same results!

This post is to illustrate how - with modern techniques - we can obtain near state-of-the-art results on image classification tasks. I'm going to use Kaggle's Dogs v/s Cats to prove my points, but these results can be extended to any similar task, including microscopic images, satellite images, and the works.

The two techniques I'm going to experiment with are:

  1. Data augmentation.
  2. Pseudo-labeling. (a variant of semi-supervised learning)

I'm using Keras with a Tensorflow backend, but the same results can be obtained on any modern deep learning library thanks to great implementations of the above techniques being readily available.

Firstly, I import all necessary libraries:

In [1]:
import numpy as np
import cv2
import os
import shutil
from glob import glob
import matplotlib.pyplot as plt
%matplotlib inline

from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.models import *
from keras.layers import *
from keras.regularizers import *
from keras.optimizers import *

from IPython.display import Image
from keras import applications
Using TensorFlow backend.

I ran these experiments on an NVIDIA Titan X, so I could've afforded to use a higher batch size than 64 to speed things up.

In [3]:
batch_size = 64
img_width, img_height = 200, 200

Now, I create the directory structure I need for training and validation

In [4]:
%mkdir -p data/train/cats
%mkdir -p data/validate/cats
%mkdir -p data/train/dogs
%mkdir -p data/validate/dogs

This is the interesting bit, the original Dogs v/s Cats dataset has ~11,000 images for each category. So to prove my point that we can get good results with a very small amount of data, I select only the first 2000 images, i.e. 1000 per class.

This is approximately only 1/8th of the actual dataset.

In [5]:
train_imgs = os.listdir('input/train')[:2000]
valid_imgs = os.listdir('input/train')[2000:2800]

for i in train_imgs:
    if i[:3] == 'cat':
        shutil.copy2('input/train/' + i, 'data/train/cats/')
    elif i[:3] == 'dog':
        shutil.copy2('input/train/' + i, 'data/train/dogs/')

for i in valid_imgs:
    if i[:3] == 'cat':
        shutil.copy2('input/train/' + i, 'data/validate/cats/')
    elif i[:3] == 'dog':
        shutil.copy2('input/train/' + i, 'data/validate/dogs/')        

Data augmentation

Augmenting data with a data generator with random affine transformations.

One of my favorite features in Keras is how smoothly it integrates with Python's generators and it's flow_from_directory method. This makes augmentation much simpler and avoids us storing all augmented images in disk.

In [6]:
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    shear_range=0.1,)

train_generator = train_datagen.flow_from_directory(
        'data/train',
        target_size=(img_height,img_width),
        batch_size=batch_size,
        class_mode='binary')
Found 2000 images belonging to 2 classes.

I chose the above augmentation parameters by some trial and error. The best way to do this is to run the augmentation on a few samples and visualize the results. Our goal is to create images which still inherently preserve the characteristics of dogs or cats. Hence transformations like vertical flipping might not help.

So, the best thing to do is to plot these images, and keep tinkering with the values till you reach a satisfactory set.

In [5]:
images = os.listdir('preview')
plt.figure(figsize=(20,10))
columns = 5
for i, image in enumerate(images):
    plt.subplot(len(images) / columns + 1, columns, i + 1)
    image = load_img('preview/' + images[i])
    plt.imshow(image)    

The validation set will only be rescaled to the size of the training set. But there are studies which show test time augmentation (TTA) or inference time augmentation - where you take the average of the results of a few augmented images on the test/validation set - give better results.

Here are some references - Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification by He, et al., 2015 where the mention performing a "multi-view testing on feature maps".

In [7]:
val_datagen = ImageDataGenerator(rescale=1./255)

val_generator = val_datagen.flow_from_directory(
        'data/validate',
        target_size=(img_height,img_width),
        batch_size=batch_size,
        class_mode='binary')
Found 800 images belonging to 2 classes.
In [8]:
classes = len(train_generator.class_indices)
assert classes is len(val_generator.class_indices)
nb_train_samples = train_generator.samples
nb_val_samples = val_generator.samples

Here is another self-imposed constraint - I wont be using a pre-trained network and instead will use a conventional model with three convolution layers with some Dropout layers and Batch Normalization.

In [9]:
model = Sequential([
    BatchNormalization(axis=1, input_shape=(img_width,img_height,3)),
    Convolution2D(32, (3,3), activation='relu'),
    BatchNormalization(axis=1),
    MaxPooling2D(),
    Convolution2D(64, (3,3), activation='relu'),
    BatchNormalization(axis=1),
    MaxPooling2D(),
    Convolution2D(128, (3,3), activation='relu'),
    BatchNormalization(axis=1),
    MaxPooling2D(),
    Flatten(),
    Dense(200, activation='relu'),
    BatchNormalization(),
    Dropout(0.5),
    Dense(200, activation='relu'),
    BatchNormalization(),
    Dropout(0.5),
    Dense(1, activation='sigmoid')
])

Now, I preform (quite a few) iterations of training and hyperparameter tuning.

In [10]:
model.compile(Adam(lr=1e-4), loss='binary_crossentropy', metrics=['accuracy'])
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
batch_normalization_1 (Batch (None, 200, 200, 3)       800       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 198, 198, 32)      896       
_________________________________________________________________
batch_normalization_2 (Batch (None, 198, 198, 32)      792       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 99, 99, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 97, 97, 64)        18496     
_________________________________________________________________
batch_normalization_3 (Batch (None, 97, 97, 64)        388       
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 48, 48, 64)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 46, 46, 128)       73856     
_________________________________________________________________
batch_normalization_4 (Batch (None, 46, 46, 128)       184       
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 23, 23, 128)       0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 67712)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 200)               13542600  
_________________________________________________________________
batch_normalization_5 (Batch (None, 200)               800       
_________________________________________________________________
dropout_1 (Dropout)          (None, 200)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 200)               40200     
_________________________________________________________________
batch_normalization_6 (Batch (None, 200)               800       
_________________________________________________________________
dropout_2 (Dropout)          (None, 200)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 201       
=================================================================
Total params: 13,680,013
Trainable params: 13,678,131
Non-trainable params: 1,882
_________________________________________________________________
In [11]:
model.fit_generator(train_generator,
    steps_per_epoch=nb_train_samples//batch_size,
    epochs=15,
    validation_data=val_generator,
    validation_steps=nb_val_samples//batch_size)
Epoch 1/15
31/31 [==============================] - 32s 1s/step - loss: 0.9553 - acc: 0.5256 - val_loss: 0.7443 - val_acc: 0.5104
Epoch 2/15
31/31 [==============================] - 31s 1s/step - loss: 0.9111 - acc: 0.5278 - val_loss: 1.1022 - val_acc: 0.5104
Epoch 3/15
31/31 [==============================] - 30s 982ms/step - loss: 0.8599 - acc: 0.5863 - val_loss: 1.4481 - val_acc: 0.5104
Epoch 4/15
31/31 [==============================] - 30s 968ms/step - loss: 0.8576 - acc: 0.5695 - val_loss: 1.8345 - val_acc: 0.5104
Epoch 5/15
31/31 [==============================] - 30s 959ms/step - loss: 0.8649 - acc: 0.5665 - val_loss: 1.9409 - val_acc: 0.5104
Epoch 6/15
31/31 [==============================] - 29s 921ms/step - loss: 0.8505 - acc: 0.5873 - val_loss: 1.9306 - val_acc: 0.5104
Epoch 7/15
31/31 [==============================] - 28s 916ms/step - loss: 0.8159 - acc: 0.5989 - val_loss: 1.6463 - val_acc: 0.5117
Epoch 8/15
31/31 [==============================] - 28s 915ms/step - loss: 0.8379 - acc: 0.5763 - val_loss: 1.2310 - val_acc: 0.5339
Epoch 9/15
31/31 [==============================] - 29s 946ms/step - loss: 0.8108 - acc: 0.5938 - val_loss: 0.8230 - val_acc: 0.5885
Epoch 10/15
31/31 [==============================] - 28s 892ms/step - loss: 0.7812 - acc: 0.5918 - val_loss: 0.6802 - val_acc: 0.6328
Epoch 11/15
31/31 [==============================] - 28s 904ms/step - loss: 0.7881 - acc: 0.6139 - val_loss: 0.6589 - val_acc: 0.6471
Epoch 12/15
31/31 [==============================] - 29s 921ms/step - loss: 0.8200 - acc: 0.5958 - val_loss: 0.6549 - val_acc: 0.6497
Epoch 13/15
31/31 [==============================] - 29s 926ms/step - loss: 0.8134 - acc: 0.5943 - val_loss: 0.6258 - val_acc: 0.6628
Epoch 14/15
31/31 [==============================] - 27s 886ms/step - loss: 0.7622 - acc: 0.6367 - val_loss: 0.6212 - val_acc: 0.6380
Epoch 15/15
31/31 [==============================] - 29s 930ms/step - loss: 0.7816 - acc: 0.6093 - val_loss: 0.6288 - val_acc: 0.6237
Out[11]:
<keras.callbacks.History at 0x7fb8f48ca3d0>
In [12]:
model.fit_generator(train_generator,
    steps_per_epoch=nb_train_samples//batch_size,
    epochs=20,
    validation_data=val_generator,
    validation_steps=nb_val_samples//batch_size)
Epoch 1/20
31/31 [==============================] - 32s 1s/step - loss: 0.7945 - acc: 0.6138 - val_loss: 0.6056 - val_acc: 0.6628
Epoch 2/20
31/31 [==============================] - 32s 1s/step - loss: 0.8082 - acc: 0.5902 - val_loss: 0.6014 - val_acc: 0.6693
Epoch 3/20
31/31 [==============================] - 31s 989ms/step - loss: 0.7259 - acc: 0.6431 - val_loss: 0.5691 - val_acc: 0.7057
Epoch 4/20
31/31 [==============================] - 30s 963ms/step - loss: 0.7431 - acc: 0.6200 - val_loss: 0.6163 - val_acc: 0.6667
Epoch 5/20
31/31 [==============================] - 29s 943ms/step - loss: 0.7376 - acc: 0.6250 - val_loss: 0.6097 - val_acc: 0.6862
Epoch 6/20
31/31 [==============================] - 29s 937ms/step - loss: 0.7149 - acc: 0.6411 - val_loss: 0.6174 - val_acc: 0.6810
Epoch 7/20
31/31 [==============================] - 28s 917ms/step - loss: 0.7241 - acc: 0.6171 - val_loss: 0.6334 - val_acc: 0.6510
Epoch 8/20
31/31 [==============================] - 29s 946ms/step - loss: 0.7094 - acc: 0.6270 - val_loss: 0.6052 - val_acc: 0.6732
Epoch 9/20
31/31 [==============================] - 28s 918ms/step - loss: 0.7147 - acc: 0.6338 - val_loss: 0.6149 - val_acc: 0.6706
Epoch 10/20
31/31 [==============================] - 29s 921ms/step - loss: 0.7359 - acc: 0.6226 - val_loss: 0.5782 - val_acc: 0.7018
Epoch 11/20
31/31 [==============================] - 30s 953ms/step - loss: 0.6946 - acc: 0.6512 - val_loss: 0.5402 - val_acc: 0.7396
Epoch 12/20
31/31 [==============================] - 29s 921ms/step - loss: 0.6884 - acc: 0.6453 - val_loss: 0.5917 - val_acc: 0.6836
Epoch 13/20
31/31 [==============================] - 29s 930ms/step - loss: 0.6947 - acc: 0.6410 - val_loss: 0.5538 - val_acc: 0.7188
Epoch 14/20
31/31 [==============================] - 29s 920ms/step - loss: 0.6861 - acc: 0.6471 - val_loss: 0.5554 - val_acc: 0.7214
Epoch 15/20
31/31 [==============================] - 28s 919ms/step - loss: 0.6839 - acc: 0.6463 - val_loss: 0.6400 - val_acc: 0.6393
Epoch 16/20
31/31 [==============================] - 29s 927ms/step - loss: 0.6761 - acc: 0.6462 - val_loss: 0.6103 - val_acc: 0.6693
Epoch 17/20
31/31 [==============================] - 28s 916ms/step - loss: 0.6418 - acc: 0.6737 - val_loss: 0.5897 - val_acc: 0.6797
Epoch 18/20
31/31 [==============================] - 28s 894ms/step - loss: 0.6550 - acc: 0.6653 - val_loss: 0.5774 - val_acc: 0.6862
Epoch 19/20
31/31 [==============================] - 29s 946ms/step - loss: 0.6329 - acc: 0.6673 - val_loss: 0.5310 - val_acc: 0.7214
Epoch 20/20
31/31 [==============================] - 29s 921ms/step - loss: 0.6579 - acc: 0.6623 - val_loss: 0.6341 - val_acc: 0.6380
Out[12]:
<keras.callbacks.History at 0x7fb850561e50>
In [13]:
model.optimizer.lr=0.00005
model.fit_generator(train_generator,
    steps_per_epoch=nb_train_samples//batch_size,
    epochs=20,
    validation_data=val_generator,
    validation_steps=nb_val_samples//batch_size)
Epoch 1/20
31/31 [==============================] - 32s 1s/step - loss: 0.6437 - acc: 0.6687 - val_loss: 0.6432 - val_acc: 0.6445
Epoch 2/20
31/31 [==============================] - 32s 1s/step - loss: 0.6580 - acc: 0.6614 - val_loss: 0.5369 - val_acc: 0.7305
Epoch 3/20
31/31 [==============================] - 31s 997ms/step - loss: 0.6257 - acc: 0.6749 - val_loss: 0.5354 - val_acc: 0.7448
Epoch 4/20
31/31 [==============================] - 30s 963ms/step - loss: 0.6396 - acc: 0.6779 - val_loss: 0.5618 - val_acc: 0.7188
Epoch 5/20
31/31 [==============================] - 29s 930ms/step - loss: 0.6357 - acc: 0.6753 - val_loss: 0.5618 - val_acc: 0.7096
Epoch 6/20
31/31 [==============================] - 28s 912ms/step - loss: 0.6564 - acc: 0.6537 - val_loss: 0.5152 - val_acc: 0.7500
Epoch 7/20
31/31 [==============================] - 28s 918ms/step - loss: 0.6225 - acc: 0.6680 - val_loss: 0.5297 - val_acc: 0.7526
Epoch 8/20
31/31 [==============================] - 29s 920ms/step - loss: 0.6692 - acc: 0.6554 - val_loss: 0.5472 - val_acc: 0.7305
Epoch 9/20
31/31 [==============================] - 28s 916ms/step - loss: 0.6133 - acc: 0.6748 - val_loss: 0.5257 - val_acc: 0.7513
Epoch 10/20
31/31 [==============================] - 29s 937ms/step - loss: 0.6113 - acc: 0.6939 - val_loss: 0.5250 - val_acc: 0.7253
Epoch 11/20
31/31 [==============================] - 29s 921ms/step - loss: 0.6695 - acc: 0.6584 - val_loss: 0.6640 - val_acc: 0.6302
Epoch 12/20
31/31 [==============================] - 29s 937ms/step - loss: 0.6213 - acc: 0.6835 - val_loss: 0.5267 - val_acc: 0.7357
Epoch 13/20
31/31 [==============================] - 28s 895ms/step - loss: 0.6099 - acc: 0.6803 - val_loss: 0.5481 - val_acc: 0.7344
Epoch 14/20
31/31 [==============================] - 29s 922ms/step - loss: 0.6356 - acc: 0.6795 - val_loss: 0.5424 - val_acc: 0.7331
Epoch 15/20
31/31 [==============================] - 29s 934ms/step - loss: 0.5881 - acc: 0.7067 - val_loss: 0.5085 - val_acc: 0.7448
Epoch 16/20
31/31 [==============================] - 28s 887ms/step - loss: 0.6014 - acc: 0.6956 - val_loss: 0.4940 - val_acc: 0.7643
Epoch 17/20
31/31 [==============================] - 29s 920ms/step - loss: 0.6107 - acc: 0.6846 - val_loss: 0.5418 - val_acc: 0.7253
Epoch 18/20
31/31 [==============================] - 29s 942ms/step - loss: 0.5991 - acc: 0.6961 - val_loss: 0.5101 - val_acc: 0.7448
Epoch 19/20
31/31 [==============================] - 29s 920ms/step - loss: 0.5963 - acc: 0.6977 - val_loss: 0.5331 - val_acc: 0.7266
Epoch 20/20
31/31 [==============================] - 28s 907ms/step - loss: 0.6016 - acc: 0.6965 - val_loss: 0.5310 - val_acc: 0.7331
Out[13]:
<keras.callbacks.History at 0x7fb850502b90>

I'm still at it...

In [14]:
model.optimizer.lr=0.00001
model.fit_generator(train_generator,
    steps_per_epoch=nb_train_samples//batch_size,
    epochs=20,
    validation_data=val_generator,
    validation_steps=nb_val_samples//batch_size)
Epoch 1/20
31/31 [==============================] - 32s 1s/step - loss: 0.5842 - acc: 0.7071 - val_loss: 0.6448 - val_acc: 0.6432
Epoch 2/20
31/31 [==============================] - 32s 1s/step - loss: 0.6158 - acc: 0.6815 - val_loss: 0.5766 - val_acc: 0.7083
Epoch 3/20
31/31 [==============================] - 31s 995ms/step - loss: 0.5986 - acc: 0.7016 - val_loss: 0.5035 - val_acc: 0.7539
Epoch 4/20
31/31 [==============================] - 30s 967ms/step - loss: 0.5646 - acc: 0.7102 - val_loss: 0.5079 - val_acc: 0.7461
Epoch 5/20
31/31 [==============================] - 29s 927ms/step - loss: 0.5758 - acc: 0.7192 - val_loss: 0.5287 - val_acc: 0.7409
Epoch 6/20
31/31 [==============================] - 29s 923ms/step - loss: 0.5746 - acc: 0.7073 - val_loss: 0.5336 - val_acc: 0.7214
Epoch 7/20
31/31 [==============================] - 29s 921ms/step - loss: 0.5971 - acc: 0.6977 - val_loss: 0.5286 - val_acc: 0.7422
Epoch 8/20
31/31 [==============================] - 28s 916ms/step - loss: 0.5642 - acc: 0.7085 - val_loss: 0.5447 - val_acc: 0.7448
Epoch 9/20
31/31 [==============================] - 28s 911ms/step - loss: 0.6000 - acc: 0.6865 - val_loss: 0.5120 - val_acc: 0.7630
Epoch 10/20
31/31 [==============================] - 28s 903ms/step - loss: 0.5873 - acc: 0.7062 - val_loss: 0.5285 - val_acc: 0.7565
Epoch 11/20
31/31 [==============================] - 28s 910ms/step - loss: 0.5815 - acc: 0.7143 - val_loss: 0.4910 - val_acc: 0.7630
Epoch 12/20
31/31 [==============================] - 28s 903ms/step - loss: 0.5947 - acc: 0.7011 - val_loss: 0.5187 - val_acc: 0.7591
Epoch 13/20
31/31 [==============================] - 29s 938ms/step - loss: 0.5691 - acc: 0.7102 - val_loss: 0.5426 - val_acc: 0.7201
Epoch 14/20
31/31 [==============================] - 28s 894ms/step - loss: 0.6021 - acc: 0.7015 - val_loss: 0.5028 - val_acc: 0.7604
Epoch 15/20
31/31 [==============================] - 29s 934ms/step - loss: 0.5825 - acc: 0.7026 - val_loss: 0.5019 - val_acc: 0.7578
Epoch 16/20
31/31 [==============================] - 28s 897ms/step - loss: 0.5494 - acc: 0.7232 - val_loss: 0.5051 - val_acc: 0.7383
Epoch 17/20
31/31 [==============================] - 28s 918ms/step - loss: 0.5776 - acc: 0.7122 - val_loss: 0.5023 - val_acc: 0.7656
Epoch 18/20
31/31 [==============================] - 28s 909ms/step - loss: 0.5537 - acc: 0.7166 - val_loss: 0.4801 - val_acc: 0.7630
Epoch 19/20
31/31 [==============================] - 28s 914ms/step - loss: 0.5444 - acc: 0.7147 - val_loss: 0.5136 - val_acc: 0.7474
Epoch 20/20
31/31 [==============================] - 29s 937ms/step - loss: 0.5487 - acc: 0.7268 - val_loss: 0.5390 - val_acc: 0.7305
Out[14]:
<keras.callbacks.History at 0x7fb8505dfd10>
In [15]:
model.optimizer.lr=0.000004
model.fit_generator(train_generator,
    steps_per_epoch=nb_train_samples//batch_size,
    epochs=15,
    validation_data=val_generator,
    validation_steps=nb_val_samples//batch_size)
Epoch 1/15
31/31 [==============================] - 32s 1s/step - loss: 0.5346 - acc: 0.7298 - val_loss: 0.4762 - val_acc: 0.7799
Epoch 2/15
31/31 [==============================] - 32s 1s/step - loss: 0.5624 - acc: 0.7166 - val_loss: 0.4793 - val_acc: 0.7747
Epoch 3/15
31/31 [==============================] - 31s 993ms/step - loss: 0.5474 - acc: 0.7218 - val_loss: 0.4877 - val_acc: 0.7773
Epoch 4/15
31/31 [==============================] - 30s 967ms/step - loss: 0.5346 - acc: 0.7333 - val_loss: 0.4855 - val_acc: 0.7643
Epoch 5/15
31/31 [==============================] - 29s 939ms/step - loss: 0.5518 - acc: 0.7344 - val_loss: 0.4895 - val_acc: 0.7617
Epoch 6/15
31/31 [==============================] - 28s 916ms/step - loss: 0.5510 - acc: 0.7182 - val_loss: 0.4876 - val_acc: 0.7656
Epoch 7/15
31/31 [==============================] - 29s 927ms/step - loss: 0.5606 - acc: 0.7136 - val_loss: 0.4904 - val_acc: 0.7630
Epoch 8/15
31/31 [==============================] - 28s 913ms/step - loss: 0.5242 - acc: 0.7377 - val_loss: 0.4917 - val_acc: 0.7591
Epoch 9/15
31/31 [==============================] - 29s 939ms/step - loss: 0.5679 - acc: 0.7188 - val_loss: 0.4653 - val_acc: 0.7812
Epoch 10/15
31/31 [==============================] - 28s 917ms/step - loss: 0.5750 - acc: 0.7084 - val_loss: 0.4857 - val_acc: 0.7643
Epoch 11/15
31/31 [==============================] - 28s 916ms/step - loss: 0.5285 - acc: 0.7293 - val_loss: 0.4697 - val_acc: 0.7773
Epoch 12/15
31/31 [==============================] - 29s 920ms/step - loss: 0.5253 - acc: 0.7319 - val_loss: 0.4989 - val_acc: 0.7604
Epoch 13/15
31/31 [==============================] - 29s 926ms/step - loss: 0.5388 - acc: 0.7360 - val_loss: 0.4631 - val_acc: 0.7786
Epoch 14/15
31/31 [==============================] - 28s 902ms/step - loss: 0.5076 - acc: 0.7576 - val_loss: 0.4638 - val_acc: 0.7852
Epoch 15/15
31/31 [==============================] - 28s 917ms/step - loss: 0.5458 - acc: 0.7253 - val_loss: 0.4559 - val_acc: 0.7930
Out[15]:
<keras.callbacks.History at 0x7fb850580c90>

Finally! I get to a validation loss of 0.45 which is not bad, but not too great either.

Pseudo Labelling

Our model at this point is kinda good, we'll use this model to generate pseudo labels and and them to the training set and train again

In [16]:
model.save_weights('data_aug.h5')
In [19]:
test_datagen = ImageDataGenerator(rescale=1. / 255)
In [20]:
test_batches = test_datagen.flow_from_directory('sample/', target_size=(img_height,img_width), shuffle=False, batch_size=1, class_mode=None)
Found 11 images belonging to 1 classes.
In [43]:
images = os.listdir('sample/unknown')
predictions = model.predict_generator(test_batches)
predictions = predictions.reshape((11,))
plt.figure(figsize=(20,10))
columns = 5
for i, image in enumerate(images):
    plt.subplot(len(images) / columns + 1, columns, i + 1)
    image = load_img('sample/unknown/' + images[i])
    if predictions[i] > 0.5:
        plt.title('dog')
    else:
        plt.title('cat')
    plt.imshow(image)
In [44]:
test_batches = test_datagen.flow_from_directory('pseudo_labelling/', target_size=(img_height,img_width), shuffle=False, batch_size=batch_size, class_mode=None)
predictions = model.predict_generator(test_batches)
predictions = predictions.reshape((predictions.shape[0],))
Found 791 images belonging to 1 classes.
In [49]:
files = os.listdir('pseudo_labelling/unknown')
for i in range(len(files)):
    if predictions[i] > 0.5:
        os.rename('pseudo_labelling/unknown/' + files[i], 'data/train/dogs/' + files[i])
    else:
        os.rename('pseudo_labelling/unknown/' + files[i], 'data/train/cats/' + files[i])
In [50]:
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    shear_range=0.1)

train_generator = train_datagen.flow_from_directory(
        'data/train',
        target_size=(img_height,img_width),
        batch_size=batch_size,
        class_mode='binary')

val_datagen = ImageDataGenerator(rescale=1./255)

val_generator = val_datagen.flow_from_directory(
        'data/validate',
        target_size=(img_height,img_width),
        batch_size=batch_size,
        class_mode='binary')

classes = len(train_generator.class_indices)
assert classes is len(val_generator.class_indices)
nb_train_samples = train_generator.samples
nb_val_samples = val_generator.samples

model.load_weights('data_aug.h5')
Found 2791 images belonging to 2 classes.
Found 800 images belonging to 2 classes.
In [51]:
model.fit_generator(train_generator,
    steps_per_epoch=nb_train_samples//batch_size,
    epochs=15,
    validation_data=val_generator,
    validation_steps=nb_val_samples//batch_size)
Epoch 1/15
43/43 [==============================] - 43s 1s/step - loss: 0.7150 - acc: 0.6680 - val_loss: 0.5135 - val_acc: 0.7161
Epoch 2/15
43/43 [==============================] - 43s 995ms/step - loss: 0.6863 - acc: 0.6695 - val_loss: 0.5098 - val_acc: 0.7448
Epoch 3/15
43/43 [==============================] - 42s 982ms/step - loss: 0.6644 - acc: 0.6604 - val_loss: 0.5673 - val_acc: 0.7161
Epoch 4/15
43/43 [==============================] - 41s 961ms/step - loss: 0.6653 - acc: 0.6552 - val_loss: 0.4975 - val_acc: 0.7721
Epoch 5/15
43/43 [==============================] - 41s 954ms/step - loss: 0.6621 - acc: 0.6640 - val_loss: 0.4775 - val_acc: 0.7812
Epoch 6/15
43/43 [==============================] - 40s 936ms/step - loss: 0.6541 - acc: 0.6542 - val_loss: 0.4821 - val_acc: 0.7982
Epoch 7/15
43/43 [==============================] - 40s 920ms/step - loss: 0.6701 - acc: 0.6555 - val_loss: 0.4829 - val_acc: 0.7943
Epoch 8/15
43/43 [==============================] - 40s 927ms/step - loss: 0.6759 - acc: 0.6521 - val_loss: 0.4837 - val_acc: 0.7904
Epoch 9/15
43/43 [==============================] - 40s 931ms/step - loss: 0.6580 - acc: 0.6608 - val_loss: 0.4930 - val_acc: 0.7682
Epoch 10/15
43/43 [==============================] - 40s 939ms/step - loss: 0.6484 - acc: 0.6502 - val_loss: 0.4924 - val_acc: 0.7578
Epoch 11/15
43/43 [==============================] - 40s 930ms/step - loss: 0.6536 - acc: 0.6479 - val_loss: 0.5030 - val_acc: 0.7370
Epoch 12/15
43/43 [==============================] - 39s 911ms/step - loss: 0.6618 - acc: 0.6558 - val_loss: 0.5217 - val_acc: 0.7279
Epoch 13/15
43/43 [==============================] - 40s 937ms/step - loss: 0.6425 - acc: 0.6542 - val_loss: 0.5131 - val_acc: 0.7266
Epoch 14/15
43/43 [==============================] - 41s 946ms/step - loss: 0.6459 - acc: 0.6686 - val_loss: 0.5419 - val_acc: 0.6979
Epoch 15/15
43/43 [==============================] - 40s 924ms/step - loss: 0.6417 - acc: 0.6635 - val_loss: 0.5276 - val_acc: 0.7214
Out[51]:
<keras.callbacks.History at 0x7fb850311c50>

After running several more iterations of hyper-parameter tuning, I got to a validation loss of 0.101 which would've gotten into the top 400 in the leaderboard.

What can be observed from this experiment is that just by observing and tweaking our data a little and using some non-conventional methods, we aren't crippled by the inherent need of tons of data to do deep learning.

Further things I'd like to try out with this are things like using GANs to generate images and augment the dataset. Try out other semi-supervised learning techniques like knowledge distillation, etc. which I hope to get around doing soon.