
Summation
So, now we have our input features flowing into our perceptron, with each input feature paired up with a randomly initialized weight. The next step is fairly easy. First, we represent all three of our features and their weights as two different 3 x 1 matrices. We want to use these two matrices to represent the combined effect of our input features and their weights. As you will recall from high school mathematics, you cannot actually multiply two 3 x 1 matrices together. So, we have to perform a little mathematical trick to reductively represent our two matrices as one value. We simply transpose our feature matrix, as follows:

We can use this new transposed feature matrix (of dimension 3 x 1) and multiply it with the weight matrix (of dimension 1 x 3). When we perform a matrix-by-matrix multiplication, the result we obtain is referred to as the dot product of these two matrices. In our case, we compute the dot product of our transposed feature matrix and our weights matrix. Doing so, we are able to reduce our two matrices to one single scalar value, which represents the collective influence of all of our input features and their respective weights. Now, we will see how we can use this collective representation and gauge it against a certain threshold to assess the quality of this representation. In other words, we will use a function to assess whether this scalar representation encodes a useful pattern to remember. A useful pattern will ideally be one that helps our model distinguish between the different classes in our data, and thereby output correct predictions.