Entropy and InformationInformation is reduction in uncertainty and has nothing to do with knowledge. Imagine you are about to observe the outcome of a coin flip and the outcome of a die roll. Which event contains more information? After Abramson, the information contained in the outcome of an event E with probability P(E) is equal to log 1/P(E) bits of information. For the unit bits we use log base 2. The result of a fair coin flip we get (log 2 = 1 bit) and for the die roll (log 6 2.585 bits). EntropyNow imagine a zero-memory information source X. The source emits symbols from an alphabet {x_1, x_2, . . . , x_k} with probabilities {p_1, p_2, . . . , p_k}. The symbols emitted are statistically independent. What is the average amount of information in observing the output of the source X?Shannon formulated the most fundamental notion in information theory for a discrete random variable, taking values from $\mathcal{X}$. The entropy of X is Proposition
InterpretationH[X] measures:
“paleface” problem Description LengthH[X] = how concisely can we describe X? Imagine X as text message:
Known and finite number of possible messages (#X). I know what X is but won’t show it to you. You can guess it by asking yes/no (binary) questions First goal: ask as few questions as possible
New goal: minimize the mean number of questions
Theorem: H[X] is the minimum mean number of binary distinctions needed to describe X. (Units of H[X] are bits) Multiple VariablesJoint entropy of two variables X and Y: Entropy of joint distribution: This is the minimum mean length to describe both X and Y
Entropy and Ergodicity(Dynamical systems as information sources, long-run randomness) Relative Entropy and Statistics(The connection to statistics) ReferencesCover and Thomas (1991) is the best single book on information theory. TutorialsConferencesResearch Groups
|