Decision Trees In ML

# Intro

Decision Trees are the most popular Data Mining tool. They are classifiers. They predict categorical outputs from categorical/real inputs.

## Benefits

• Easy to understand
• Easy to implement
• Easy to use
• Computationally Cheap

## Drawbacks

Overfitting is one drawback of decision trees.

## Decision Tree Representation

• Each internal node tests an attribute
• Each branch corresponds to attribute value
• Each leaf node assigns a classification

## Tree Notation

There are many formats we can use to depict trees. The most common is the basic visual tree diagram. Aside from that we can use other more graphical forms, or the text-based C4.5.

Each level of the tree is a different attribute, and each node at that level depicts a different value of the attribute (known as its class).

A node shows the probability of its value (of the current attribute level) leading to a positive or negative outcome. (e.g. Will having defaulted on a loan lead to a default on a credit card).

## C4.5

• We read this top-down and left-to-right
• The same level indentation indicates that the nodes are children of the same degree.
• Indentation directly underneath a node means that the next line is its child.

# When to Use

• Instances describable by attributeâ€“value pairs
• Target function is discrete valued
• Disjunctive hypothesis may be required
• Possibly noisy training data

## Examples

• X OR Y
• Calendar Scheduling

# Top-Down Induction of Decision Trees (TDIDT)

This is an algorithm for generating these trees. It's very similar to the ID3 algorithm, and often referred to as that.

## Algorithm

1. Let A be the "best" decision attribute for the next node (as in, which attribute should I test next)
2. Assign A as the decision attribute for the next node
3. For each value of A, create a new descendent of the node
4. Sort training examples to leaf nodes (e.g. if training example has weather as sunny then that example is sorted into the 'sunny' node, rainy ones will be put into the 'rainy' node).
5. Recurse/Iterate over new children until all are perfectly sorted.

## Refinement

TDIDT is the way of searching a complete hypothesis space. It uses a preference bias, and has inevitable overfitting (hence requires pruning).

There's been decades of research into refining it (e.g. dealing with continuous values and attributes with costs).

Performance of TDIDT can be improved with ensemble methods.

### Windowing

When the training set is too large for memory we can use ID3's windowing.

1. Let the Window be a subset of the instances
2. Construct the decision tree from all instances in the window
3. Use this tree to classify the rest of the instances
4. If all the instances are all correctly classified then stop
5. Else add selected misclassified instances to the window and repeat the construction.

This is related to ensemble learning.

page revision: 18, last edited: 16 Apr 2012 01:17