Search Techniques (COMP3411)

Search strategies are defined by which order the nodes are picked for expansion.

Evaluation

They're evaluated in terms of:

  • Completeness (if it exists, does it always find the solution)
  • Time Complexity (Number of nodes generated/expanded)
  • Space Complexity (Maximum number of nodes in memory at any one time)
  • Optimality (Does it find the best (least-cost) solution).

Time and Space Complexity

Time and Space Complexity are measured in terms of:

  • b = Maximum Branching Factor (of the tree) - i.e. if each node can have 10 children, then each level is 100, 101, 102
  • m = Maximum Depth of State Space (can be infinite, there may always be possible children states)
  • d = Depth of the least-cost (best) solution

Algorithm Comparison

We compare different search algorithms in two ways:

  • Benchmarking: Run both algorithms (same language, same hardware, same test cases, etc) and compare the time differences
  • Mathematical Analysis: Essentially complexity analysis.

Uninformed Search Strategies

Uninformed search strategies are also known as "blind" because they only have the information given in the problem statement, meaning they can only differentiate goal vs non-goal, not degrees of closeness.

Informed Search Strategies

Informed search strategies (also known as "heuristic") use information about the task to determine the value of a state. They're more efficient than uninformed strategies because they can work out a system of preferable nodes, rather than just sequential nodes.

We use a BestFirst() function (like the path length one in dijkstra) to figure out which order to visit nodes.

Informed Search Strategies include:

We also need to look at:

Heuristic Functions

A heuristic function is a problem-specific function that determines an estimate of the cheapest path cost from a current state to the goal state.

Straight Line Distance as Heuristic

If $h_{SLD}(n)$ = the straight-line distance between n and the goal location, and we assume that paths typically tend to approximate the direct connection between two states, then we need to know the physical co-ordinates that each state represents, to figure out the heuristic.

Hence it becomes:

(1)
\begin{align} h_{SLD}(n) = \sqrt{(n_x - g_x)^2 + (n_y - g_y)^2} \end{align}

Where n = current node and g= goal node.

Straight line distance is admissible because the straight line is the shortest distance between two points.

Dominance

The most dominant heuristic is the admissible heuristic closest to the actual value it's estimating.

Hence if $h_1() >= h_2(n) \forall n$, and both are admissible, then $h_1$ is the dominant heuristic, and better for searching.ß

Composite Heuristics

If $h_1, h_2, ... h_n$ are all admissible heuristics then the max of them will be the composite heuristic - it'll be admissible and dominant.