
Logistic regression for multiclass classification
When more than two classes are involved, logistic regression is known as multinomial logistic regression. In multinomial logistic regression, instead of sigmoid, use the softmax function, which can be described mathematically as follows:

The softmax function produces the probabilities for each class so that the probabilities vector adds up to 1. At the time of inference, the class with the highest softmax value becomes the output or predicted class. The loss function, as we discussed earlier, is the negative log-likelihood function, -l(w), that can be minimized by the optimizers, such as gradient descent.
The loss function for multinomial logistic regression is written formally as follows:

Here, ϕ(z) is the softmax function.
We will implement this loss function in the next section. In the following section, we will dig into our example for multiclass classification with logistic regression in TensorFlow.