
Fitting the model
The fit parameter initiates the training session, and hence should be thought of as synonymous to training our model. It takes your training features, their corresponding training labels, the number of times the model sees your data, and the number of learning examples your model sees per training iteration as training measures, respectively:
model.fit(x_train, y_train, epochs=5, batch_size = 2) #other arguments
validation split=0.33, batch_size=10
You can also have additional arguments to shuffle your data, create validation splits, or give custom weights to output classes. Shuffling training data before each epoch can be useful, especially to ensure that your model does not learn any random non-predictive sequences in our data, and hence simply overfit the training set. To shuffle your data, you have to set the Boolean value of the shuffle argument to True. Finally, custom weights can be particularly useful if you have underrepresented classes in your dataset. Setting a higher weight is equivalent to telling your model, Hey, you, pay more attention to these examples here. To set custom weights, you have to provide the class_weight argument with a dictionary that maps class indices to custom weights corresponding to your output classes, in order of the indices that are provided.
The following is an overview of the key architectural decisions you will face when compiling a model. These decisions relate to the training process you instruct your model to undergo:
- epochs: This argument must be defined as an integer value, corresponding to the number of times your model will iterate through the entire dataset. Technically, the model is not trained for a number of iterations given by epochs, but merely until the epoch of index epochs is reached. You want to set this parameter just right, depending on the nature of complexity you want your model to represent. Setting it too low will lead to simplistic representations that are used for inference, whereas setting it too high will make your model overfit on your training data.
- batch_size: The batch_size defines the number of samples that will be propagated through the network per training iteration. Intuitively, this can be thought of as the number of examples the network sees at a time while learning. Mathematically, this is simply the number of training instances the network will see before updating the model weights. So far, we have been updating our model weights at each training example (with a batch_size of 1), but this can quickly become a computational and memory management burden. This becomes especially cumbersome in instances where your dataset is too big to even load into memory. Setting a batch_size helps prevent this. Neural networks also train faster in mini-batches. In fact, batch size even has an impact on the accuracy of our gradient estimate during the backpropagation process, as shown in the following diagram. The same network is trained using three different batch sizes. Stochastic denotes random, or a batch size of one. As you can see, the direction of the stochastic and mini-batch gradients (green) fluctuates much more in comparison to the steady direction of the larger full-batch gradient (blue):
- The number of iterations (which don't need to be explicitly defined) simply denotes the number of passes, where each pass contains the number of training examples denoted by the batch_size. To be clear, by one pass, we mean a forward filtering of data through our layers, as well as the backpropagation of errors. Suppose that we set our batch size to 32. One iteration encompasses our model by viewing 32 training examples, then updating its weights accordingly. In a dataset of 64 examples with a batch size of 32, it will take only two iterations for your model to cycle through it.
Now that we have called the fit method on our training samples to initiate the learning process, we will observe the output, which simply displays the estimated training time, loss (in errors), and accuracy per epoch on our training data:
Epoch 1/5
60000/60000 [==========] - 12s 192us/step - loss: 0.3596 - acc: 0.9177
Epoch 2/5
60000/60000 [==========] - 10s 172us/step - loss: 0.1822 - acc: 0.9664
Epoch 3/5
60000/60000 [==========] - 10s 173us/step - loss: 0.1505 - acc: 0.9759
Epoch 4/5
60000/60000 [==========] - 11s 177us/step - loss: 0.1369 - acc:
0.97841s - loss:
Epoch 5/5
60000/60000 [==========] - 11s 175us/step - loss: 0.1245 - acc: 0.9822
In only five full runs through our data, we achieve an accuracy of 0.96 (96.01%) during training. Now, we must verify whether our model is truly learning what we want it to learn by testing it on our secluded test set, which our model hasn't seen so far:
model.evaluation(x_test, y_test)
10000/10000 [==============================] - 1s 98us/step
[0.1425468367099762, 0.9759]