0.Cover

Game-8

Game Tree Search

The minimax procedure

Assume we have a measure of the degree of advantage to player A, (which can be considered as degree of disadvantage to player B
At A's turn, B will move to a place most beneficial to A, i.e. value to a node at A's turn = max. value of any of its successors.
At B's turn, B will move to a place least beneficial to A, i.e. value to a node at B's turn = min. value of any of its successors.
If the entire game tree is available, then we can determine which move will be most advantageous. (Fig C5-1)
For most games, the game tree is far too large to be generated fully. Instead, a reasonable portion of the tree is generated, starting from the current position.
require a mean for estimation of the value of the nodes.
Example:(tic-tac-toe)

the heuristic function measures how advantageous is for player A.

It is understood that the lower down the game tree, the more accurate the evaluation will be.
Hence if it is A's turn, A will try to move to a place with highest score (maximizing level), and for B's turn, B will try to move to a place with lowest score (minimizing level) 0 minmax procedure.

the evaluation function can be (for entire game tree):
which measures how advantageous it is to player A

Alpha-Beta Pruning

Huge number of nodes need to be investigated for a reasonably sized game (e.g. chess). Has to reduce the searching.
The following figure shows situation where some nodes need not be evaluated (no heuristics needed).
the technique uses 2 parameters, alpha and beta.

a carries the lower bound of a minimizing level as shown in figure C5-4, i.e. F(2)=15. Hence in order to have significant effect for node 3, F(3) should be greater than 15. If node 4 has a value of 10, we can stop exploring node 3.
this elimination is called alpha cutoff.
In fig C5-5, b represents an upper bound on the value to MAX. of node 2, i.e. F(4)=20. Since node 2 is a minimizing layer, we can ignore node 5 once node 5 has a value of at least 25 (F(6)).
the corresponding elimination is called beta cutoff.

The Alpha-Beta Procedure

To perform minimax search with the Alpha-Beta procedure,

if the level is the top level, let alpha be - and let beta be .
if the limit of search has been reached, compute the static value of the current position relative to the appropriate player. Report the result.
If the level is a minimizing level,
- Until all children are examined with Alpha-Beta procedure, or until alpha is equal to or greater than beta,
  - Use the Alpha-Beta procedure, with the current alpha and beta values, on a child; note the value reported.
  - Compare the value reported with the beta value; if the reported value is smaller, reset beta to the new value.
- Report beta
Otherwise, the level is a maximizing level:
- until all children are examined with Alpha-Beta or alpha is equal to or greater than beta,
  - Use the Alpha-Beta procedure, with the current alpha and beta value, on a child; note the value reported.
  - Compare the value reported with the alpha value; if the reported value is larger, reset alpha to the new value.
- Report alpha.

The alpha-beta procedure can reduce the number of node explored.
the worst case is a-b pruning does nothing.
It is desirable that the best successor of each node to be the first one evaluated.
for example, can first evaluate the node before expanding and use this value as an approximation.

For a random uniform game tree, it can be shown that the a-b technique is optimal in the sense that no algorithm can evaluate any game tree by examining fewer nodes than a-b does with an appropriate ordering of successors. (Knuth & Moore).

Heuristics in Game Tree Search

Besides those question in state-space search, such as
- which node to expand first?
- which operator to apply next?
- which node to prune?
we have other questions
Answer is simple if can expand the whole game tree.
how to play a good game even though it has not found a solution to the problem of winning.
Alternatives to search:
- use of "book moves" in the opening of chess
- recognizing patterns on the board and associating appropriate playing methods with each pattern.
Search cannot be eliminated completely (even human use some degree of searching)
tradeoff between specialized game knowledge and amount of search required.

Static Evaluation

a function that estimates the value of a board position without looking at any of its successors.
Perfect evaluation function (i.e. one that report win/loss/draw) is usually unavailable except for simple games.
representing various features of the position, high values for favorable ones:
- for example, in chess, by assigning numerical values to each kind of piece and find the difference between the total values of the 2 players' pieces.
- use king safety, mobility, center control and pawn structure.
note that the static evaluation function is only an estimate (i.e. maybe wrong), the minimax procedure no longer gives a correct move, but rather provides a heuristics for the choice of move.
can modify this procedure:
- take into account how many "good" successors in the sub-tree.
- set a threshold, the first move that meets the threshold is chosen.
- use a simple evaluation function, consider more feature when there is a tie.

Depth of Search

The reason for looking farther ahead is to compensate for error in the static evaluation.
There will be less room for mistaken prediction if a deep tree is generated before the evaluation function is applied.
problem: combinatorial explosion
- e.g. in chess, estimated no. of nodes for 6 moves ~ 10⁹.
Human occasionally lookahead for, say 14 or 15 steps, on limited branches.

Strategy:

fixed-depth search: set a fixed depth to which the game tree is to be generated and to apply the static evaluation function only to nodes at this depth.
- e.g. cannot return a realistic value if at the depth limit, the players happen to be halfway through an exchange of pieces.
  - quiescent or dead position 0 the tip nodes are relatively stable.
    
    search would be extended beyond the normal limit for non-quiescent positions, e.g.

Horizon effects: reality exists in terms of the output of the evaluation function, and anything that is not detectable at evaluation time does not exist as far as the program is concerned.

e.g. program moves to force certain positions to appear at the search horizon, and conclude that it has avoided some undesirable effect when in fact the effect has only been delayed to a point beyond the horizon.

usually due to an oversimplified definition of quiescent position.

iterative deepening: a complete search, investigating all legal moves (subject to a-b pruning) is done to depth 2, returning a move, then depth 3, depth 4, ... until a preset time limit is exceeded.
- for efficiency, information from earlier iteration is saved for use in later ones.

Ordering of Search

with a-b pruning, the order of considering moves is very important.

e.g. correct ordering of the moves for a-b cutoffs would allow the search depth to be roughly doubled.
1. fixed ordering: apply evaluation function to the successors and order the nodes accordingly
2. refutation move: for each move that is not a best move, it should be shown as quickly as possible.
  e.g. strong replies should be considered first, which may refute the proposed move.
3. dynamic ordering: when a move (considered best) was found to be bad later (after expanding more depths), backtrack and reorder the move and start again with a different estimated best moves (because good moves should be search first)

Width of Search

full-width search: consider all possible moves
- great difficulty in determining, without search, which move can be ignored safely)
not applicable for some game, e.g. GO (~200 branches)

(a) Forward Pruning

generate all legal moves at a position, use a fixed ordering scheme to sort them according to "plausibility" and then discard all but the best few moves.
Tapered forward pruning: no. of moves retained was a function of the depth at which they are generated.
also can prune a node if the estimated function lie outside a and b . [Note that the actual value can be within the limit]
remember moves that have been found to be absurd (on some definition) and to reject them in other positions, too, unless there has been an appropriate change of circumstances.

(b) Goal-directed move generation

does not generate moves unless they seem relevant to some goal.
- the program make a preliminary analysis of which goal is relevant to the situation
- then return a number of plausible moves according to the goal
- goal may change during the game (e.g. beginning and end game will have different goal)