Minimax

Minimax is an algorithm designed to maximise gain and minimise loss in the worst case scenario of a game play. It's perfect play for deterministic fully observables games.

The idea is to choose the next move with the highest minimax value (the best achievable playoff against the opponent playing their best possible move).

Algorithm

If the node is a leaf or root node
   Return the heuristic of the node 
Else If we're playing at the node
   Let α = -∞
   For each child of node
      Let α = max(α, minimax(child, depth-1))
   Return α 
Else (if the opponent is playing at the node)
   Let β = +∞
   For each child of node
      Let β = min(β, minimax(child, depth-1))
   Return β

Negmax

Minimax assumes that all nodes are evaluated in regards to a fixed player. If we assume that we just evaluate each node in regards to whose turn it is, we get to negmax.

I.e. it'll always be our turn to play, so we just cut out the "if we are to play at node" line, and the whole "else" section.

Properties of Minimax

  • Completeness: Yes
  • Optimality: Yes
  • Time Complexity: $O(b^m)$ - sum of all the levels ($b^0 + b^1 + b^2 + ... b ^m$)
  • Space Complexity: $O(m)$ - assuming you explore in a dfs way with one path at a time

Take chess, with a b~=35 and m~=100 (for reasonable games). The search is completely infeasible. So we can use heuristic evaluation and resource limits, as well as alpha-beta pruning to make it feasible.

Resource limits and heuristic evaluation

Resource Limits

We can limit the depth to which we search ahead (resource limit) in order to make it more feasible. Hence search 5 layers ahead to see if anything terrible happens, rather than to the game's end.

Heuristic Evaluation

We can use a heuristic to evaluate a state, and then prune off bad states, or only target better ones.

E.g. for chess we can allocate points based on what position each piece is, and whether it's attacking another piece. The value of each feature we allocate points for can be determined by reinforcement learning.

Alpha-Beta (α-β) pruning

Alpha-Beta Pruning is simply minimax with cutting off branches of the tree that have been shown to be less optimal (e.g. moving a Queen results in her being taken).

Algorithm

function alphabeta(node, depth, α, β) {
   If the node is a leaf node or a root node
      Return the heuristic of the node
   Else if we're playing
      For each child of node
         Let α = max(α, alphabeta(child, depth-1, α, β)) 
         If α >= β
            exit the loop (break) 
       Return α   
   Else (if it's our opponent's turn)
      For each child of node
         Let β = min(β, alphabeta(child, depth-1, α, β)) 
         If β <= α
            exit the loop (break) 
       Return β

Negmax Alpha-Beta

Negmax of Alpha Beta is of course, the same principal as with minimax. Just ignore the possibility that it isn't our turn.

Meanings of Alpha and Beta

Alpha
Best value for us found so far in the current path
Beta
Best value for opponent found so far in the current path

If we find a node with a better value than alpha, we pass this up the tree.
If the current node value exceeds beta (because the best beta is a min value) then it's not the most likely solution (where the opponent maximises their play), and hence we prune off that section.

Properties

It's the same results as minimax, with a much better computational complexity.

  • Complete: Yes
  • Optimal: Yes
  • Time Complexity:
    • With 'perfect ordering' the time complexity is $O(n^{\frac{m}{2}})$

To prove a "bad" move is bad we need only consider one good reply against it, but to prove a "good" move is good we need to consider all replies. Hence with optimal ordering (always finding the one good reply first) we can count the "bad" moves as taking 1 node, hence basically discount their computation, hence search twice as far with the same computational complexity as minimax.

In Stochastic games

Stochastic games have an element of luck which means traditional minimax isn't quite as useful. Hence we have the Expectimax.

Essentially in between each move (max or min) we add in a 'chance node', which returns the weighted average of the values of the successor nodes.

Exact Values

Deterministic Games

Payoff in deterministic games acts as an ordinal utility function. As long as the order (less than or greater than or any other binary representation used) is the same, we can monotonically (as in, order-preservingly) transform the values.

This means that in minimax exact values don't matter (as in, is a node 2 or 20 or 200) - if all nodes are multiplied by 10 that's okay.

Stochastic Games

In stochastic games move choice is only preserved by positive linear transformation (i.e. each value must be the proportionally the same to the others before and after the transformation).