
Entropy
Let's say you're a soldier stuck behind enemy lines. Your goal is to let your allies know what kind of enemies are coming their way. Sometimes, the enemy may send tanks, but more often, they send patrols of people. Now, the only way you can signal your friends is by using a radio with simple binary signals. You need to figure out the best way to communicate with your allies, so as to not waste your precious time and get discovered by the enemy. How do you do this? Well, first you map out many sequences of binary bits, each specific sequence corresponding to a specific type of enemy (such as patrols or tanks). With a little knowledge of the environment, you already know that patrols are much more frequent than tanks. It stands to reason then, that you probably will be using the binary signal for patrol much more often than the one for tank. Hence, you will allocate fewer binary bits to communicate the presence of an incoming patrol, as you know you will be sending that signal more often than others. What you're doing is exploiting your knowledge about the distribution over types of enemies to reduce the number of bits that you need to send on average. In fact, if you have access to the overall underlining distribution of incoming patrols and tanks, then you could theoretically use the smallest number of bits to communicate most efficiently with the friendlies on the other side. We do this by using the optimal number of bits at each transmission. The number of bits to represent a signal is known as the entropy of this data, and can be formulated with the following equation:

Here, H(y) denotes a function that refers to the optimal number of bits to represent an event with the probability distribution, y. yi simply refers to the probability of another event, i. So, supposing that seeing an enemy patrol is 256 times more likely to happen than seeing an enemy tank, we would model the number of bits to use to encode the presence of an enemy patrol, as follows:
Patrol bits = log(1/256pTank)
= log(1/pTank) + log(1/(2^8))
= Tank bits - 8