Information Gain In ML

Gain(S, A) = expected reduction in entropy due to sorting on A.
Where S is a set of training examples, and A is an attribute.

It's taking the entropy of the set and taking away the entropy for each value of the attribute you're testing, so that you figure out how much entropy you lose overall.
(losing uncertainty = gaining certainty = gaining information)

We define it as:

(1)
\begin{align} Gain(S,A) \equiv Entropy(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} Entropy(S_v) \end{align}
• Where |S| is the the number of examples of that attribute, so all examples at that level

# Example

Let there be 6+ and 6- outcomes for any particular attribute.

(2)
\begin{equation} |S| = 6 + 6 = 12 \end{equation}

|Sv| is the number of examples that have a specific value of that attribute

• e.g. let Attribute A have two values, v = 1 or v =2
• v = 1 has 2+ and 5- cases, v = 2 has 4+ and 1- cases
(3)
\begin{align} S[v=1] = 2 + 5 = 7 \\ S[v=2] = 4 + 1 = 5 \end{align}

# Information Value Function (aka Entropy)

Gain can also be written as:

Gain(Attribute) = Entropy([totalYes's, totalNo's]) - Entropy([v1Yes's,v1No's], [v2Yes's, v2No's], [v3Yes's, v3No's]).

Entropy here is also referred to as the info function.

page revision: 5, last edited: 01 Apr 2012 11:42