- Learning
- Improving over a task with respect to a performance measure based on experience.

Hence for any learning to be done we have three factors:

- A Task, T
- A Performance Measure, P
- Experience, E

We then must ask ourselves the following questions:

- How do we gain Experience?
- What should be learnt?
- How should the Task be represented?
- What algorithm should we use to learn it?
- What type of learning is it?
- Is the training experience representative of the performance goal?

## Example (Checkers)

T: Play Checkers

P: Percentage of Games Won

E: Opportunity to play against self

## Defining the Target Function

The target function is what we're trying to figure out. It's the function that determines what the target attribute will be (as in, based on the input, what will the outcome be?).

We can write this as the following function, designed to look at the value of the board state and choose the most valuable next move.

`ChooseMove: Board -> Move`

and then we can have a Valuation function:

`V: Board->Real`

Where we attribute points to different board states to give the Agent a way of learning which is the best move. For instance:

- V(b=final won state): 100
- V(b=final lost state): -100
- V(b=final drawn state): 0
- V(b=not final):V(b'=best final state possible playing optimally from b)

This gives *correct* values, but is not *operational* (in that over any large gamestate this takes too long and eventually the generated game tree (expanding every state) becomes too massive to be functional).

## Representing the Target Function

We need to choose how to represent our function. Will it be a Neural Network? A collection of rules? A polynomial function?

In this case we could choose the following polynomial:

(1)- bp(b): Number of black pieces on the board
- rp(b): Number of red pieces on the board
- bk(b): Number of black kings on the board
- rk(b): Number of red kings on the board
- bt(b): Number of red pieces takeable on black's next turn
- rt(b): Number of black pieces takeable on red's next turn

The learning problem would then become figuring out the optimal values for the w0->w6 coefficients.

The problem with this polynomial is that we don't just need to know these facts, we want to know *where* on the board the pieces are.

# Obtaining Training Values

- V(b): The true target function (as in, what we're aiming for)
- V^(b): The learned function (as in, the hypothesis the Agent's currently figured out)
- $V_{train}(b)$: The training value

One possible rule for estimating these training values:

- $V_{train}(b) \leftarrow V^{\wedge}(Successor(b))$

As in, assign the current board value to be the learned value of the next board. This makes sense for functions which are more accurate towards the end of a game, but does seem counter-intuitive to be using the current function to refine training data used to refine the function.

# Weight Adjustment Rule

We need to modify the above weights to come up with the best function. Hence we do the following.

Repeat:

- Select training example b

- Compute Error(b):

- For each board feature $f_i$, update weight:

Where *c* is a small constant (e.g. 0.1) to moderate learning