Minimax is an algorithm designed to maximise gain and minimise loss in the worst case scenario of a game play. It's perfect play for deterministic fully observables games.
The idea is to choose the next move with the highest minimax value (the best achievable playoff against the opponent playing their best possible move).
Algorithm
If the node is a leaf or root node Return the heuristic of the node Else If we're playing at the node Let α = -∞ For each child of node Let α = max(α, minimax(child, depth-1)) Return α Else (if the opponent is playing at the node) Let β = +∞ For each child of node Let β = min(β, minimax(child, depth-1)) Return β
Negmax
Minimax assumes that all nodes are evaluated in regards to a fixed player. If we assume that we just evaluate each node in regards to whose turn it is, we get to negmax.
I.e. it'll always be our turn to play, so we just cut out the "if we are to play at node" line, and the whole "else" section.
Properties of Minimax
- Completeness: Yes
- Optimality: Yes
- Time Complexity: $O(b^m)$ - sum of all the levels ($b^0 + b^1 + b^2 + ... b ^m$)
- Space Complexity: $O(m)$ - assuming you explore in a dfs way with one path at a time
Take chess, with a b~=35 and m~=100 (for reasonable games). The search is completely infeasible. So we can use heuristic evaluation and resource limits, as well as alpha-beta pruning to make it feasible.
Resource limits and heuristic evaluation
Resource Limits
We can limit the depth to which we search ahead (resource limit) in order to make it more feasible. Hence search 5 layers ahead to see if anything terrible happens, rather than to the game's end.
Heuristic Evaluation
We can use a heuristic to evaluate a state, and then prune off bad states, or only target better ones.
E.g. for chess we can allocate points based on what position each piece is, and whether it's attacking another piece. The value of each feature we allocate points for can be determined by reinforcement learning.
Alpha-Beta (α-β) pruning
Alpha-Beta Pruning is simply minimax with cutting off branches of the tree that have been shown to be less optimal (e.g. moving a Queen results in her being taken).
Algorithm
function alphabeta(node, depth, α, β) { If the node is a leaf node or a root node Return the heuristic of the node Else if we're playing For each child of node Let α = max(α, alphabeta(child, depth-1, α, β)) If α >= β exit the loop (break) Return α Else (if it's our opponent's turn) For each child of node Let β = min(β, alphabeta(child, depth-1, α, β)) If β <= α exit the loop (break) Return β
Negmax Alpha-Beta
Negmax of Alpha Beta is of course, the same principal as with minimax. Just ignore the possibility that it isn't our turn.
Meanings of Alpha and Beta
- Alpha
- Best value for us found so far in the current path
- Beta
- Best value for opponent found so far in the current path
If we find a node with a better value than alpha, we pass this up the tree.
If the current node value exceeds beta (because the best beta is a min value) then it's not the most likely solution (where the opponent maximises their play), and hence we prune off that section.
Properties
It's the same results as minimax, with a much better computational complexity.
- Complete: Yes
- Optimal: Yes
- Time Complexity:
- With 'perfect ordering' the time complexity is $O(n^{\frac{m}{2}})$
To prove a "bad" move is bad we need only consider one good reply against it, but to prove a "good" move is good we need to consider all replies. Hence with optimal ordering (always finding the one good reply first) we can count the "bad" moves as taking 1 node, hence basically discount their computation, hence search twice as far with the same computational complexity as minimax.
In Stochastic games
Stochastic games have an element of luck which means traditional minimax isn't quite as useful. Hence we have the Expectimax.
Essentially in between each move (max or min) we add in a 'chance node', which returns the weighted average of the values of the successor nodes.
Exact Values
Deterministic Games
Payoff in deterministic games acts as an ordinal utility function. As long as the order (less than or greater than or any other binary representation used) is the same, we can monotonically (as in, order-preservingly) transform the values.
This means that in minimax exact values don't matter (as in, is a node 2 or 20 or 200) - if all nodes are multiplied by 10 that's okay.
Stochastic Games
In stochastic games move choice is only preserved by positive linear transformation (i.e. each value must be the proportionally the same to the others before and after the transformation).