Continuous Valued Attributes In Ml

Decision trees are only intended to work for discrete attributes (e.g. weather is sunny/rainy/cloudy, not 78 or 79 or 80… degrees). Hence for continuous (or 'numeric') attributes we have to modify things.

We create a discrete attribute (if > or if < blah) to test the continuous values.
I.e. we split the values somewhere in the middle to artificially create a discrete attribute.

E.g. Temp = 70, 71, 72…80, 81, 82
Value 1 = Temperature < 75.5
Value 2 = Temperature > 75.5

Evaluating Split Points

There are n-1 possible split points for n values of an attribute in the training set.

It's common to split the values at the halfway mark between the start and the end (Dyadic Decision Trees do this), but there are more sophisticated methods.

For instance, choosing the best split point by information gain.

Information Gain

  • Let Split1 have 4 yes's and 2 no's.
  • Let Split2 have 5 yes's and 3 no's.

We then apply Entropy([4,2],[5,3]):

\begin{align} Entropy([4,2],[5,3]) = \frac{4+2}{4+2+5+3} * Entropy([4,2]) + \frac{5+3}{4+2+5+3} * Entropy([5,3]) = 0.939 bits \end{align}

The problem with this is that if an attribute has more values, Gain is more likely to select it because it's more likely to split instances into 'pure' subsets.
It can also use quite irrelevant attributes (like an ID code - unique to each instance) which give it a high gain on the training set (the info gain of it is just the info gain of the root), but are useless for predicting unknowns as it tells us nothing about the structure of the decision.

Gain Ratio

We can instead use the gain ratio. It takes into account the number and size of all children nodes the attribute splits the data into.

\begin{align} GainRatio(S,A) = \frac{Gain(S,A)}{SplitInformation(S,A)} \end{align}

SplitInformation(S,A) can also be thought of as the Entropy of each of the values. i.e.
SplitInformation(S,A) = Entropy([numInstances of V1, numInstances of V2, … , numInstances of Vc])