
Introducting Keras layers
The core building blocks of a neural network model in Keras is its layers. Layers are basically data-processing filters that contort the data they are fed into more useful representations. As we will see, prominent architectures of neural networks mostly vary in the manner in which layers are designed, and the interconnection of neurons among them. The inventor of Keras, Francois Chollet, describes this architecture as performing a progressive distillation on our data. Let's see how this works:

We define our model by initializing an instance of a blank model with no layers. Then, we add our first layer, which always expects an input dimension corresponding to the size of the data you want it to ingest. In our case, we want the model to ingest sets of 28 x 28 pixels, as we defined previously. The extra comma we added refers to how many examples the network will see at a time, as we will soon see. We also call the Flatten() method on our input matrix. All this does is convert each 28 x 28 image matrix into a single vector of 784-pixel values, each corresponding to its own input neuron.
We continue adding the layers until we get to our output layer, which has a number of output neurons corresponding to the number of our output classes—in this case, the 10 digits between 0 and 9. Do note that only the input layer needs to specify an input dimension of data entering it, as the progressive hidden layers are able to perform automatic shape inference (and only the first, because the following layers can do automatic shape inference).