Hypothesis Space Search (ML)

Hypothesis Space Search (ID3)

(As in, an algorithm that searches the space of possible decision trees, doing hill climbing on information gain).

ID = Iterative Deepening.

Algorithm

  • Complete the hypothesis space (make it contain all finite discrete-valued functions).
    • Target function must be there somewhere (each function has at least one tree representing it)
  • Output a single hypothesis
  • No Back-Tracking
    • Hence can get stuck in Local Minima
  • Statistically-based search choices
    • Hence robust to noisy data
  • Inductive Bias
    • Prefers short-trees

Inductive Bias

There is a bias in the ID3 Hypothesis Search Space; it has a preference for short trees and those with attributes near the root that have high information gain.

This fits with Occam's razor; the shortest hypothesis that fits the data will likely be the most general (hence most informative, hence most preferable).

Occam's Razor

Benefits to picking shortest valid hypothesis:

  • Fewer short hypotheses than long ones
  • Unlikely to be coincidental if a short one is valid, rather than if a long one fits the data

Disadvantages to picking shortest valid hypothesis:

  • There are many ways to define small sets of hypotheses (as in, all trees with 74902384092384 nodes, is a small set). So what's special about small-sized hypotheses, as a set?