In [154]:
import numpy as np
from keras import backend as K
from keras import optimizers
from keras.layers import Activation
from keras.layers import Dense
from keras.models import Sequential
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.layers.convolutional import ZeroPadding2D
from keras.layers.normalization import BatchNormalization
from keras.models import Model
from keras.utils import np_utils
from keras.utils.layer_utils import convert_all_kernels_in_model
from utils import local_response_normalization

We will be recreating the famous CaffeNet implementation here, but the code for the OG two channel AlexNet can be found on the Github repo which is a sequential implementation.

" ...they are virtually indistinguishable" - Evan Shelhamer, Caffe lead developer.

In [155]:
model = Sequential()

Convolutional Layer 1

"The first convolutional layer filters the 224×224×3 input image with 96 kernels of size 11×11×3 with a stride of 4 pixels" - Sec. 3.5 para 3.

We can compute the spatial size of the output volume as a function of the input volume size (W), the receptive field size of the Conv Layer neurons (F), the stride with which they are applied (S), and the amount of zero padding used (P) on the border. You can convince yourself that the correct formula for calculating how many neurons “fit” (O) is given by

O = (W−F+2P)/S+1

Therefore, for conv layer 1, as per the diagram, the output needs to be a volume of size 55x55x96

If input image is of size 224x224x3, W = 224, F = 11, S = 4, P = 0(not mentioned)

O = ((224 - 11 + 0)/4) + 1 -> this is not even an integer so there must be something wrong!

“The other author's were Ilya Sutskever and Geoffrey Hinton. So, AlexNet input starts with 227 by 227 by 3 images. And if you read the paper, the paper refers to 224 by 224 by 3 images. But if you look at the numbers, I think that the numbers make sense only of actually 227 by 227.” - Andrew Ng

O = ((227 - 11 + 0)/4) + 1 = (216/4) + 1 = 55

In [156]:
model.add(Conv2D(filters=96, input_shape=(3,227,227), kernel_size=(11,11), strides=(4,4), padding='valid', name='Conv1'))

model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
Conv1 (Conv2D)               (None, 96, 55, 55)        34944     
=================================================================
Total params: 34,944
Trainable params: 34,944
Non-trainable params: 0
_________________________________________________________________

No. of params here can be calculated using (filter_height * filter_width * input_image_channels + 1) * number_of_filters

Therefore, params = (11x11x3 + 1) x 96 = 34944

The "+ 1" is for the biases.

Post convolution operations

"The ReLU non-linearity is applied to the output of every convolutional and fully-connected layer." - Sec 3.5 Para 2

"Response-normalization layers follow the first and second convolutional layers." - Sec 3.5 Para 2

"Max-pooling layers, of the kind described in Section 3.4, follow both response-normalization layers as well as the fifth convolutional layer" - Sec 3.5 Para 2

More on this in the slides.

O = ((W - F + 2P)/ S) + 1

For pooling, W = 55, F = 3, P = 0, S = 2

Therefore O = ((55 - 3 + 0)/2) + 1 = 27

In [157]:
model.add(Activation('relu'))
model.add(local_response_normalization(name="LRN1"))
# model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2), padding='valid', name="MaxPool"))

model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
Conv1 (Conv2D)               (None, 96, 55, 55)        34944     
_________________________________________________________________
activation_77 (Activation)   (None, 96, 55, 55)        0         
_________________________________________________________________
LRN1 (Lambda)                (None, 96, 55, 55)        0         
_________________________________________________________________
MaxPool (MaxPooling2D)       (None, 96, 27, 27)        0         
=================================================================
Total params: 34,944
Trainable params: 34,944
Non-trainable params: 0
_________________________________________________________________

Convolutional Layer 2

"The second convolutional layer takes as input the (response-normalized and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5×5×48." Sec. 3.5 para 3

O = (W−F+2P)/S+1

After pooling, we want output to be of size 13x13 but only after pooling, so we dont know what the intermediate output size is, but we know that it needs to be an integer. We know F = 5, W = 27, S = ?, P = ?, but lets investigate

O = (27 - 5 + 2P)/S + 1 needs to be an integer.

Lets assume stride for the conv to be 2 and padding to be 1 O = ((27 - 5 + 2)/ 2) + 1 = 11. Now if we apply pooling, we wont get 13. So this is wrong as well.

So lets assume the stride to be 1 and padding to be 0.

O = (27 - 5 + 0)/1 + 1 = 23

Now if we apply pooling,

O = (23 - 3 + 0)/2 + 1 = 11. This is not correct!

But if we apply 1x1 padding to this, we'll get 13.

In [158]:
model.add(Conv2D(filters=256, kernel_size=(5,5), padding='valid', name="Conv2"))
model.add(Activation('relu'))
model.add(local_response_normalization(name="LRN2"))
# model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2), name="MaxPool2"))
model.add(ZeroPadding2D(padding=(1, 1)))

model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
Conv1 (Conv2D)               (None, 96, 55, 55)        34944     
_________________________________________________________________
activation_77 (Activation)   (None, 96, 55, 55)        0         
_________________________________________________________________
LRN1 (Lambda)                (None, 96, 55, 55)        0         
_________________________________________________________________
MaxPool (MaxPooling2D)       (None, 96, 27, 27)        0         
_________________________________________________________________
Conv2 (Conv2D)               (None, 256, 23, 23)       614656    
_________________________________________________________________
activation_78 (Activation)   (None, 256, 23, 23)       0         
_________________________________________________________________
LRN2 (Lambda)                (None, 256, 23, 23)       0         
_________________________________________________________________
MaxPool2 (MaxPooling2D)      (None, 256, 11, 11)       0         
_________________________________________________________________
zero_padding2d_25 (ZeroPaddi (None, 256, 13, 13)       0         
=================================================================
Total params: 649,600
Trainable params: 649,600
Non-trainable params: 0
_________________________________________________________________

Convolutional Layer 3

"The third convolutional layer has 384 kernels of size 3×3×256 connected to the (normalized, pooled) outputs of the second convolutional layer." - Sec 3.5 Para 3

"The third, fourth, and fifth convolutional layers are connected to one another without any intervening pooling or normalization layers." - Sec 3.5 Para 3

In [159]:
model.add(Conv2D(filters=384, kernel_size=(3,3), padding='valid', name="Conv3"))
model.add(Activation('relu'))

model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
Conv1 (Conv2D)               (None, 96, 55, 55)        34944     
_________________________________________________________________
activation_77 (Activation)   (None, 96, 55, 55)        0         
_________________________________________________________________
LRN1 (Lambda)                (None, 96, 55, 55)        0         
_________________________________________________________________
MaxPool (MaxPooling2D)       (None, 96, 27, 27)        0         
_________________________________________________________________
Conv2 (Conv2D)               (None, 256, 23, 23)       614656    
_________________________________________________________________
activation_78 (Activation)   (None, 256, 23, 23)       0         
_________________________________________________________________
LRN2 (Lambda)                (None, 256, 23, 23)       0         
_________________________________________________________________
MaxPool2 (MaxPooling2D)      (None, 256, 11, 11)       0         
_________________________________________________________________
zero_padding2d_25 (ZeroPaddi (None, 256, 13, 13)       0         
_________________________________________________________________
Conv3 (Conv2D)               (None, 384, 11, 11)       885120    
_________________________________________________________________
activation_79 (Activation)   (None, 384, 11, 11)       0         
=================================================================
Total params: 1,534,720
Trainable params: 1,534,720
Non-trainable params: 0
_________________________________________________________________
In [160]:
model.add(ZeroPadding2D(padding=(1, 1)))

model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
Conv1 (Conv2D)               (None, 96, 55, 55)        34944     
_________________________________________________________________
activation_77 (Activation)   (None, 96, 55, 55)        0         
_________________________________________________________________
LRN1 (Lambda)                (None, 96, 55, 55)        0         
_________________________________________________________________
MaxPool (MaxPooling2D)       (None, 96, 27, 27)        0         
_________________________________________________________________
Conv2 (Conv2D)               (None, 256, 23, 23)       614656    
_________________________________________________________________
activation_78 (Activation)   (None, 256, 23, 23)       0         
_________________________________________________________________
LRN2 (Lambda)                (None, 256, 23, 23)       0         
_________________________________________________________________
MaxPool2 (MaxPooling2D)      (None, 256, 11, 11)       0         
_________________________________________________________________
zero_padding2d_25 (ZeroPaddi (None, 256, 13, 13)       0         
_________________________________________________________________
Conv3 (Conv2D)               (None, 384, 11, 11)       885120    
_________________________________________________________________
activation_79 (Activation)   (None, 384, 11, 11)       0         
_________________________________________________________________
zero_padding2d_26 (ZeroPaddi (None, 384, 13, 13)       0         
=================================================================
Total params: 1,534,720
Trainable params: 1,534,720
Non-trainable params: 0
_________________________________________________________________

Convolutional Layer 4 and 5

"The fourth convolutional layer has 384 kernels of size 3×3×192, and the fifth convolutional layer has 256 kernels of size 3×3×192." - Sec 3.5 Para 3

In [161]:
model.add(Conv2D(filters=384, kernel_size=(3,3), padding='valid', name="Conv4"))
model.add(Activation('relu'))
model.add(ZeroPadding2D(padding=(1, 1)))

model.add(Conv2D(filters=256, kernel_size=(3,3), padding='valid', name="Conv5"))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3), strides=(2,2), name="MaxPool3"))


model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
Conv1 (Conv2D)               (None, 96, 55, 55)        34944     
_________________________________________________________________
activation_77 (Activation)   (None, 96, 55, 55)        0         
_________________________________________________________________
LRN1 (Lambda)                (None, 96, 55, 55)        0         
_________________________________________________________________
MaxPool (MaxPooling2D)       (None, 96, 27, 27)        0         
_________________________________________________________________
Conv2 (Conv2D)               (None, 256, 23, 23)       614656    
_________________________________________________________________
activation_78 (Activation)   (None, 256, 23, 23)       0         
_________________________________________________________________
LRN2 (Lambda)                (None, 256, 23, 23)       0         
_________________________________________________________________
MaxPool2 (MaxPooling2D)      (None, 256, 11, 11)       0         
_________________________________________________________________
zero_padding2d_25 (ZeroPaddi (None, 256, 13, 13)       0         
_________________________________________________________________
Conv3 (Conv2D)               (None, 384, 11, 11)       885120    
_________________________________________________________________
activation_79 (Activation)   (None, 384, 11, 11)       0         
_________________________________________________________________
zero_padding2d_26 (ZeroPaddi (None, 384, 13, 13)       0         
_________________________________________________________________
Conv4 (Conv2D)               (None, 384, 11, 11)       1327488   
_________________________________________________________________
activation_80 (Activation)   (None, 384, 11, 11)       0         
_________________________________________________________________
zero_padding2d_27 (ZeroPaddi (None, 384, 13, 13)       0         
_________________________________________________________________
Conv5 (Conv2D)               (None, 256, 11, 11)       884992    
_________________________________________________________________
activation_81 (Activation)   (None, 256, 11, 11)       0         
_________________________________________________________________
MaxPool3 (MaxPooling2D)      (None, 256, 5, 5)         0         
=================================================================
Total params: 3,747,200
Trainable params: 3,747,200
Non-trainable params: 0
_________________________________________________________________

Fully connected layer (FC6)

"The fully-connected layers have 4096 neurons each." - Sec 3.5 Para 3

In [162]:
model.add(Flatten())

model.add(Dense(4096, name="FC6"))
model.add(Activation('relu'))
model.add(Dropout(0.5))

Fully connected layer 2(FC7)

In [163]:
model.add(Dense(4096, name="FC7"))
model.add(Activation('relu'))
model.add(Dropout(0.4))

Softmax layer

In [164]:
n_classes=1000
model.add(Dense(1000))
model.add(Activation('relu'))
model.add(Dropout(0.5))
if n_classes != 1000:
    model.add(Dense(n_classes))
model.add(Activation('softmax'))
In [165]:
if K.backend() == 'tensorflow':
    convert_all_kernels_in_model(model)
In [166]:
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
Conv1 (Conv2D)               (None, 96, 55, 55)        34944     
_________________________________________________________________
activation_77 (Activation)   (None, 96, 55, 55)        0         
_________________________________________________________________
LRN1 (Lambda)                (None, 96, 55, 55)        0         
_________________________________________________________________
MaxPool (MaxPooling2D)       (None, 96, 27, 27)        0         
_________________________________________________________________
Conv2 (Conv2D)               (None, 256, 23, 23)       614656    
_________________________________________________________________
activation_78 (Activation)   (None, 256, 23, 23)       0         
_________________________________________________________________
LRN2 (Lambda)                (None, 256, 23, 23)       0         
_________________________________________________________________
MaxPool2 (MaxPooling2D)      (None, 256, 11, 11)       0         
_________________________________________________________________
zero_padding2d_25 (ZeroPaddi (None, 256, 13, 13)       0         
_________________________________________________________________
Conv3 (Conv2D)               (None, 384, 11, 11)       885120    
_________________________________________________________________
activation_79 (Activation)   (None, 384, 11, 11)       0         
_________________________________________________________________
zero_padding2d_26 (ZeroPaddi (None, 384, 13, 13)       0         
_________________________________________________________________
Conv4 (Conv2D)               (None, 384, 11, 11)       1327488   
_________________________________________________________________
activation_80 (Activation)   (None, 384, 11, 11)       0         
_________________________________________________________________
zero_padding2d_27 (ZeroPaddi (None, 384, 13, 13)       0         
_________________________________________________________________
Conv5 (Conv2D)               (None, 256, 11, 11)       884992    
_________________________________________________________________
activation_81 (Activation)   (None, 256, 11, 11)       0         
_________________________________________________________________
MaxPool3 (MaxPooling2D)      (None, 256, 5, 5)         0         
_________________________________________________________________
flatten_7 (Flatten)          (None, 6400)              0         
_________________________________________________________________
FC6 (Dense)                  (None, 4096)              26218496  
_________________________________________________________________
activation_82 (Activation)   (None, 4096)              0         
_________________________________________________________________
dropout_17 (Dropout)         (None, 4096)              0         
_________________________________________________________________
FC7 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
activation_83 (Activation)   (None, 4096)              0         
_________________________________________________________________
dropout_18 (Dropout)         (None, 4096)              0         
_________________________________________________________________
dense_17 (Dense)             (None, 1000)              4097000   
_________________________________________________________________
activation_84 (Activation)   (None, 1000)              0         
_________________________________________________________________
dropout_19 (Dropout)         (None, 1000)              0         
_________________________________________________________________
activation_85 (Activation)   (None, 1000)              0         
=================================================================
Total params: 50,844,008
Trainable params: 50,844,008
Non-trainable params: 0
_________________________________________________________________
In [167]:
sgd = optimizers.SGD(lr=0.01, decay=0.0005, momentum=0.9)
model.compile(loss='categorical_crossentropy', optimizer=sgd,  metrics=['accuracy'])

Load whatever data you want to train this on, one-hot encode the labels and put them in an np array and run the following command:

In [168]:
# model.fit(data, labels, epochs=90, batch_size=128)

Now you have successfully recreated AlexNet! Pat yourself!