Information Gain In ML

Gain(S, A) = expected reduction in entropy due to sorting on A.
Where S is a set of training examples, and A is an attribute.

It's taking the entropy of the set and taking away the entropy for each value of the attribute you're testing, so that you figure out how much entropy you lose overall.
(losing uncertainty = gaining certainty = gaining information)

We define it as:

(1)
\begin{align} Gain(S,A) \equiv Entropy(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} Entropy(S_v) \end{align}
  • Where |S| is the the number of examples of that attribute, so all examples at that level

Example

Let there be 6+ and 6- outcomes for any particular attribute.

(2)
\begin{equation} |S| = 6 + 6 = 12 \end{equation}

|Sv| is the number of examples that have a specific value of that attribute

  • e.g. let Attribute A have two values, v = 1 or v =2
  • v = 1 has 2+ and 5- cases, v = 2 has 4+ and 1- cases
(3)
\begin{align} S[v=1] = 2 + 5 = 7 \\ S[v=2] = 4 + 1 = 5 \end{align}

Information Value Function (aka Entropy)

Gain can also be written as:

Gain(Attribute) = Entropy([totalYes's, totalNo's]) - Entropy([v1Yes's,v1No's], [v2Yes's, v2No's], [v3Yes's, v3No's]).

Entropy here is also referred to as the info function.