# A mathematical theory of communication | Computer Science | Khan Academy

Voiceover: Shannon had

just finished developing his theories related to

cryptography and therefore was well aware that human

communication was a mix of randomness and

statistical dependencies. Letters in our messages were obviously dependent on previous

letters to some extent. In 1949, he published

a groundbreaking paper, “A Mathematical Theory of Communication”. In it, he uses Markov models as the basis for how we can think about communication. He starts with a toy example. Imagine you encounter a bunch of text written in an alphabet of A, B, and C. Perhaps you know nothing

about this language, though you notice As

seem to clump together, while Bs and Cs do not. He then shows that you

could design a machine to generate similar-looking text, using a Markov chain. He starts off with a

zeroth-order approximation, which means we just independently select each symbol A, B, or C at

random, and form a sequence However, notice that this sequence doesn’t look like the original. He shows then you could do a bit better with a first-order approximation, where the letters are

chosen independently, but according to the

probability of each letter in the original sequence. This is slightly better

as As are now more likely, but it still doesn’t

capture much structure. The next step is key. A second-order approximation

takes into account each pair of letters which can occur. In this case, we need three states. The first state represents

all pairs which begin with A, the second all pairs that begin with B, and the third state all

pairs that begin with C. Notice now that the A

cup has many AA pairs, which makes sense, since

the conditional probability of an A after an A is higher

in our original message. We can generate a sequence using this second-order

model easily as follows. We start anywhere and pick a tile, and we write down our

output the first letter, and move to the cup defined

by the second letter. Then we pick a new tile, and repeat this process indefinitely Notice that this sequence

is starting to look very similar to the original message, because this model is capturing the conditional dependencies

between letters. If we wanted to do even better, we could move to a

third-order approximation, which takes into account groups of three letters, or “trigrams”. In this case, we would need nine states. But next, Shannon applies

this exact same logic to actual English text, using statistics that were known for letters,

pairs, and trigrams, etc. He shows the same

progression from zeroth-order random letters to

first-order, second-order and third-order sequences. He then goes on and tries

the same thing using words instead of letters, and he writes “the resemblance to ordinary English text “increases quite

noticeably at each depth.” Indeed, these machines were producing meaningless text, though they contained approximately the same

statistical structure you’d see in actual English. Shannon then proceeds to define a quantitative measure of information, as he realizes that the

amount of information in some message must be tied up in the design of the

machine which could be used to generate similar-looking sequences. Which brings us to his concept of entropy.

Why has no one commented on this? This video is awesome and accessible to a broad audience, and the ideas are very innovative. The same theory has been applied to sentence structure for linguistic purposes, and can be seen on smartphones and Google search when typing creates suggested text.

at 3:18, in the book, it written "..each of the above steps" but the speaker said " each depth"

which brings us to the concept of entropy …………………………………………………………………..??????????????????????

this is great stuff…

0:46 voice crack XD

I'm just binge watching these videos because of how great they are.

Who is this guy?

INFORMATION CONFIRMATION THEORY. .d