Talk:Mutual information: Difference between revisions

Content deleted Content added

Inline

Revision as of 12:24, 19 February 2016

Statistics C‑class Mid‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics articles
C	This article has been rated as C-class on Wikipedia's content assessment scale.
Mid	This article has been rated as Mid-importance on the importance scale.

Mathematics C‑class Mid‑priority

	Mathematics portal This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.MathematicsWikipedia:WikiProject MathematicsTemplate:WikiProject Mathematicsmathematics articles
C	This article has been rated as C-class on Wikipedia's content assessment scale.
Mid	This article has been rated as Mid-priority on the project's priority scale.

Unit of information?

instead >> It should be noted that these definitions are ambiguous because the base of the log function is not specified. To disambiguate, the function I could be parameterized as I(X,Y,b) where b is the base. Alternatively, since the most common unit of measurement of mutual information is the bit, a base of 2 could be specified.

how about The unit of information depends on the base of the log function. Most common are bases of 2, e, or 10, resulting in units of bits, nats and digits, respectively.

Internetexploder 08:15, 29 April 2007 (UTC)[reply]

I didn't initiate the notice, but the guidelines state that this notice is internal to Wikipedia and are not really for the casual reader's consumption. Any attention that a qualified contributor can give is welcome. Ancheta Wis 23:55, 23 Oct 2004 (UTC)

Noting Category:Pages needing attention, I would say that, while someone may have thought that a good guideline, it is de facto incorrect (and not policy). I, for one, do not agree with that guideline, because it hides the fact that the article needs attention from all those who can edit it and it disclaims to newbies that we know the article isn't as good as it could be. — 131.230.133.185 5 July 2005 19:23 (UTC)

[This article is] poorly explained. --Eequor 03:39, 22 Aug 2004 (UTC)

Simplify eq?

why not just say:

I(X,Y)=\sum _{x,y}p(x,y)\times \log _{2}{\frac {p(x,y)}{p(x)\,p(y)}}.\!

instead of all the confusing talk about what f and g are? Please elaborate if there is a specific reason why it is done this way. -- BAxelrod 02:08, 19 October 2005 (UTC)[reply]

The definitions given in the article are correct. They just happen to be highly formal. Less formal definitions are given in the article on information theory (recently added by me, but I called it transinformation). Whether this level of formality is appropriate for this article is a matter for debate. I tend to think not, because in general, someone who is working at that level of formality is not going to be looking in Wikipedia for a definition, but on the other hand, it "simplifies" matters because then one definition suffices for both the discrete and continuous cases. (i.e. integration over the counting measure is simply ordinary discrete summation.) -- 130.94.162.64 22:53, 2 December 2005 (UTC)[reply]

O.K. Simplified the formula. -- 130.94.162.64 05:24, 3 December 2005 (UTC)[reply]

Another note:

I(X,Y)\,

is incorrect.

I(X;Y)\,

is the accepted usage. Use a semicolon. -- 130.94.162.64 11:35, 4 December 2005 (UTC)[reply]

Mutual information between $m$ random variables

How about adding the mutual information among multiple scalar random variables:

$I(y_{1};\ldots ;y_{m})=\sum _{i=1}^{m}H(y_{i})-H(\mathbf {y} )$

(In reply to unsigned comment above:) Apparently there isn't a single well-defined mutual information for three or more random variables. It is sometimes defined recursively:

I(Y_{1};Y_{2})=H(Y_{1})-H(Y_{1}|Y_{2}),\,

I(Y_{1};\ldots ;Y_{m})=I(Y_{1};\ldots ;Y_{m-1})-I(Y_{1};\ldots ;Y_{m-1}|Y_{m}),\,m\geq 3,

where

I(Y_{1};\ldots ;Y_{m-1}|Y_{m})=\mathbb {E} _{Y_{m}}\{I((Y_{1}|y_{m});\ldots ;(Y_{m-1}|y_{m}))\}.

This definition fits more along the lines of the interpretation of the mutual information as the measure of an intersection of sets, but it can become negative as well as positive for three or more random variables (in contrast to the definition in the comment above, which is always non-negative).

--130.94.162.64 23:15, 19 May 2006 (UTC)[reply]

Source

The formula is from Shannon (1948). This should be written.
Who coined the term "mutual information"? --Henri de Solages 18:41, 7 November 2005 (UTC)[reply]

Remove irrelevant reference?

The first reference, Cilibrasi and Vitanyi (2005), contains only two mentions of mutual information:

"Another recent offshoot based on our work is hierarchical clustering based on mutual information, [23]."

"[23] A. Kraskov, H. St¨ogbauer, R.G. Adrsejak, P. Grassberger, Hierarchical clustering based on mutual information, 2003, http://arxiv.org/abs/q-bio/0311039"

I suggest this reference be removed as it's not helpful.

--84.9.75.186 10:57, 3 September 2007 (UTC)[reply]

The Kraskov & Stögbauer paper is an interesting one. Is that the one you are referring to? —Dfass 11:25, 3 September 2007 (UTC)[reply]

In-text symbols are inconsistently formatted

Most of the in-text symbols and equations are formatted with italics, but a few (those with subscripts) are formatted in math mode. Shouldn't they all be formatted consistently? The separated equations use math mode, so my preference would be for in-text symbols and equations to be formatted in math mode as well. Jamesmelody (talk) 17:53, 22 February 2009 (UTC)[reply]

Ohhh... Apparently, the difference between the appearance of some in-text symbols and others is due to rendering differences in my browser. I was writing in-text equations elsewhere, using the math environment everywhere, and noticed the same type of differences. Perhaps I need to tweak my math rendering preferences...

Jamesmelody (talk) 19:18, 22 February 2009 (UTC)[reply]

Subtleties with Entropy and Mutual Information for Continuous Random Variables

I believe the article requires greater rigor when dealing with continuous random variables. Consider the following example:

Let $X=N(0,\alpha )$ , a normal distribution, and let $Y=X^{2}$ , and suppose I want to find the mutual information $I(X,Y)$ via $I(X,Y)=H(X)-H(X|Y)$ . I know that $H(X)=H[N(0,\alpha )]=0.5\log _{2}(2\pi e\alpha )$ bits. Additionally, I know that the random variable $X|Y$ is discrete with

\mathrm {Prob} \{X={\sqrt {y}}\,|Y=y\}=1/2\ \mathrm {and}

\mathrm {Prob} \{X=-{\sqrt {y}}\,|Y=y\}=1/2,

except when $y=0$ , in which case $\mathrm {Prob} \{X=0\,|Y=0\}=1$ . Hence, $I(X,Y)=0.5\log _{2}(2\pi e\alpha )-1$ bits.

Problem solved — except that now I can select $\alpha$ to make the first term less than one, making the mutual information negative. But the article tells me that mutual information can not be negative. This seems inconsistent.

I believe the inconsistency arises from the fact that $X$ and $Y$ are not jointly continuous random variables, because the support of the joint probability "density" is the curve $y=x^{2}$ . More rigorously, the joint cumulative distribution function has a discontinuity on the curve $y=x^{2}$ . This is evident in the fact that while the individual random variables are continuous, the conditional random variable $X|Y$ is discrete. Hence my argument subsequently mixed (discrete) entropy with differential entropy, two definitions that are not consistent.

Perhaps my belief is wrong. There may be a better explanation for the inconsistency, one which enables a fully consistent calculation in this example. (Perhaps the differential entropy of the conditional random variable $X|Y$ is $-\infty$ ? This would make the mutual information infinite, regardless of the value of $\alpha$ , which would be consistent both with the requirement of nonnegativity and with the understanding of mutual information as an indication of degree of dependence.)

I suggest that the article be explicit in the case of continuous mutual information of the conditions on the joint distribution. In addition, I suggest that the section "Relation to Other Quantities" be explicit about when it is appropriate to use differential entropy as opposed to entropy. Finally, I suggest that the article explore the possible inconsistencies and limitations of mutual information for the continuous case. In particular, I should expect in my example that the mutual information indicate complete statistical dependence. Jamesmelody (talk) 19:44, 22 February 2009 (UTC)[reply]

The mutual information between two continuous random variable can be somewhat more rigorously explained by discretizing them by placing them in "bins" and then taking the limit as the bin size goes to zero. For example, if X and Y are continuous real-valued random variables, then their mutual information is

I(X;Y)=\lim _{\delta \rightarrow 0}I\left(\left\lfloor {\frac {X}{\delta }}\right\rfloor ;\left\lfloor {\frac {Y}{\delta }}\right\rfloor \right).

Using this definition it can be easily seen that the (bivariate) mutual information can never be negative. In your example where one variable is completely determined by the other, i.e.

Y=X^{2}

, it makes sense that the mutual information would be infinite, since we can specify X to arbitrarily many digits of accuracy, and perfectly recover all those digits from the value of Y.

The most general rigorous and widely applicable definition of the mutual information is probably in terms of the Kullback–Leibler divergence of the joint distribution with respect to the product of the marginal distributions, which is defined (and remains finite) if and only if the joint distribution of the two random variables is absolutely continuous (in the sense of measures) with respect to the product of the marginals. Anyways, feel free to be bold and improve the article as you see fit. I'll try to help. Deepmath (talk) 22:48, 22 February 2009 (UTC)[reply]

Scholarpedia

Possible source: [[1]]. Note that some of Scholarpedia is under copyright, so you can't just copy the content. —Preceding unsigned comment added by Njerseyguy (talk • contribs) 04:45, 7 July 2009 (UTC)[reply]

p(x,y) = 0 ?

It seems to me that there is a possibility for joint probability function p(x,y) to return 0, for example, in such case when variable values x and y never occur together. Can anyone explain, how it is possible to calculate mutual information, when there is a chance that p(x,y) for some x and y returns 0.

90.190.231.235 (talk) 12:15, 30 December 2009 (UTC)Siim[reply]

Priors or pseudo-counts are often used to iron out wrinkles like this and account for un-observed data. In this case you can define I(X;Y) := 0 if p(x,y)=0. This is relatively trivial to prove using L'Hôpital's rule. --Paul (talk) 21:49, 30 December 2009 (UTC)[reply]

Bogus Expectations

The conditional mutual expectation is defined using the epxression

\mathbb {E} _{Z}{\big (}I(X;Y)|Z{\big )}

There are two problems with this expression. First, $I(X;Y)$ is not a random variable. So taking its expectation is a no-op. Second, the expectation is conditioned on the random variable $Z$ , which is not defined outside the scope of the expectation. This is how conditional expecatation is defined (see e.g. [2])

\mathbb {E} {\big (}X|Y)=\sum _{x}p(x|Y)x

Indeed, this expectation is itself a random variable.

The same problem occurs in the definition of the multivariate mutual entropy. —Preceding unsigned comment added by 128.114.60.41 (talk) 19:50, 9 March 2010 (UTC)[reply]

Stochastic processes

Would some knowledgeable person please add some material on the mutual information of stochastic processes? Thanx! Rinconsoleao (talk) 14:12, 22 December 2010 (UTC)[reply]

Multivariate mutual information equations

Can anyone verify the equations in the multivariate mutual information section.

This equation

I(X_{1};\,...\,;X_{n})=I(X_{1};\,...\,;X_{n-1})-I(X_{1};\,...\,;X_{n-1}|X_{n}),

Does not reduce to any of the equations above for the basic 2-variable mutual information, if, for example, $X_{1}=X$ and $X_{n}=X_{2}=Y$

Should the equation instead be this?

I(X_{1};\,...\,;X_{n})=H(X_{1};\,...\,;X_{n-1})-H(X_{1};\,...\,;X_{n-1}|X_{n}),

Willkeim (talk) 13:43, 8 April 2012 (UTC) Willkeim[reply]

Distance is "universal"

The claim that the distance D(X,Y) is "universal" is pretty flimsy. Here is the text from the article "An information-based sequence distance and its application to whole mitochondrial genome phylogeny", Ming Li et al., referred to in the Wikipedia article:

Now, consider any computable distance D. In order to exclude degenerate distances such as D(x, y) = 1/2 for all sequences x and y, we limit the number of sequences in a neighborhood of size d. Let us require for each x,

|{y : |y| = n and D(x, y) ≤ d}| ≤ 2^dn. (2)

Assuming equation (2), we prove the following theorem.

THEOREM 2. For any computable distance D, there is a constant c < 2 such that, with probability 1, for all sequences x and y, d(x, y) ≤ cD(x, y).

In other words, the distance must be "computable" (which I expect is a very specific, application-dependent notion) and it must satisfy another very technical condition (2). So it is not very universal -- at least not without considerable qualification. "Qualified universal" is fairly like "adulterated pregnant" in my view. See also remarks at Talk:Variation_of_information. 129.132.211.9 (talk) 19:57, 22 November 2014 (UTC)[reply]

Figure

The figure giving the mutual information of various scatterings of data looks like nonsense. It gives positive values for distributions that are separable.

@@ Line 165: / Line 165: @@
 In other words, the distance must be "computable" (which I expect is a very specific, application-dependent notion) and it must satisfy another very technical condition (2). So it is not very universal -- at least not without considerable qualification. "Qualified universal" is fairly like "adulterated pregnant" in my view. See also remarks at [[Talk:Variation_of_information]]. [[Special:Contributions/129.132.211.9|129.132.211.9]] ([[User talk:129.132.211.9|talk]]) 19:57, 22 November 2014 (UTC)
+== Figure ==
+The figure giving the mutual information of various scatterings of data looks like nonsense. It gives positive values for distributions that are separable.