Two Player Games

Collapse Content Show Content

In the previous lesson we only considered one-player games. Now we will kick things up a notch and make AI agents for turn-based, two-player, competitive games. So how will this be any different? Well, before we could easily traverse the tree any way we wanted, because we were the only ones making moves. In a multi-agent competitive environment, we have to take into account the fact there is at least one other agent that is trying to work against us. So if we are take a step that brings the game closer to a Win state, then the other agent will take a step that will take the game towards out loss state.

What this means is that when we traverse the game tree, trying to find a path to the winning state, after each step, we need to consider what step the opponent will take. Lets illustrate this with an example of tic-tac-toe.

If you make a tree without considering the opponent, it would go something like this:

tictac1

Obviously, this would never work, assuming the opponent makes rational decisions. Since an entire game tree of tic-tac-toe is too big to draw, consider this scenario from X's prespective:

tictac1

Now assuming it is X's turn, let's map out all possible moves for X:

tictac1

Then it's O's turn. So we map out O's possible moves in all three cases:

tictac1

Lastly we map out X's moves in all cases:

tictac1

Now this would be a game tree that takes into account the moves of the opponent. So how would our AI agent at the initial condition know which path to take? Well first let's represent the desirability of each finishing state by assigning it a score. For example: - 1 = Win - 0 = Draw - -1 = Loss

tictac1

Note that this is from X's point-of-view, so X's aim is to Maximize this score, and O's aim is to Minimize this score. Generally speaking, these two opposing agents can be called max and **min*8 respectively.

Now let's consider the last turn, which was X's. Since X only has one possible move in each case, we can say that each state has the same desirability (or score) as the finishing state it leads to. So we can assign each of these scores as:

tictac1

Now let's consider the preceding move, which was O's. Since O is min thus O's aim is to minimize the score. So in each case, O will consider the option that leads to the minimum score. Hence we can assign scores to these states based on what O will pick:

tictac1

Now we consider the first turn which is X's. Here, X has 3 possible moves, and we know the final score of each of those states will result in. Since X's aim is to maximize the score, our agent for X, will simply pick the move that leads to the maximum final score.

So to summarize, for two-player competitive games we: 1. We map out the game tree, alternatively considering each player's possible moves. 2. We assign scores to all last states (leaves of the tree) 3. We propagate the scores up the tree alternatively performing Max, Min on all possible moves for each case (based on who's turn it is). 4. At root we just pick the move which leads to the highest scored state.

Challenge: Make an AI agent for tic-tac-toe based on Min-Max trees

Tic-tac-toe is a relatively simple game. Therefore, it's trivial to map out the entire game tree at any given state. Since we can map out the tree to the end of the game, assigning scores to the state is simply a matter assigning scores to win, loss, draw. If we consider more complex games like checkers or chess, mapping out the entire game tree is practically impossible for most almost all states. So we can only map out the tree for a certain depth. Thus what we need is a heuristic for assigning scores to the states. A heuristic is simply a way to estimate the desirability of states. It's a sort of guess on which states have a relatively more chance leading to a win, as compared to other states. This can be anything at all. For example in checkers, we can score a state based on: - How many total pieces you have in play - How many total pieces your opponent has in play - Ratio of your pieces to opponent's pieces. - All of the above applied to king pieces. - Distance of your pieces from the king me positions.

When designing a heuristic evaluation function, it should be kept in mind that time taken to evaluate a state, significantly impacts the speed of the AI agent. So only such measures should be chosen that you feel are crucial in judging the desirability of the state. Sometimes you may want to consider dropping a certain measure in favour of exploring an extra depth or two.

Now that we understand the under-lying logic of the Minimax Trees, let's explore how such an AI agent would work practically. Consider an arbitary 2-player game for which we want to make an AI player for Max. Our player will traverse tree depth-first, down to 3 depths. At depth three, it would evaluate the game states using some evaluation function. So lets simulate this.

The AI player picks a move, follows it to depth 1, where it is Min's turn. Then it assumes that Min will select a move that leads down to depth 2. Now it's Max's turn again. Here Max selects a move that takes it to depth 3. Since we decided that our agent will explore only 3 depths at a time, it stops here and evaluates the game state. Let's assume this state is valued at 5. So by now, out game tree looks like:

minmax1

So as it stands, Max at depth 2 will pick 5. This is, if there are no moves that lead to higher ranked state. So in a way, this Max now dictates a lower limit to the nodes below it. Right now, this lower limit is 5 as we'll be ignoring all modes that yield a score of less than 5. Let's assume there is one more possible move at this state, and that it yields a score of 6. So we know that Max will pick 6.

minmax2

Now lets back-track to the Min at depth 1. As it stands, it will be picking 6, unless, there’s a move that can return lower than that. So now Min is dictating an upper limit to its child nodes. This limits is 6 as now we are looking for nodes below 6.

So since, we’re ate depth 1, we can explore other moves from here, down two more depths. Let’s assume we do that. and find a 7

minmax3

So Max at depth 2 appears to be picking a 7 unless, there is a child with a higher value, Hence this Max will return a 7 or higher. Now the problem is that the parent Min decreed that it only wants something below 6. Thus this Max is of no consequence to the Min. So we need to explore any moves at this node, because that it would be pointless. So all remaining child nodes of this Max node are pruned.

minmax3

Note that we pruned the remaining child nodes of a Max node when the lower limit it imposed, was higher than the upper limit imposed by the parent node. Similarly we would have pruned child nodes of a Min node if it’s upper limit is lower than a parent’s lower limit. This process is called alpha-beta pruning. The lower limit is called the alpha and the upper limit is called beta. The Max node determines the alpha while the Min node determines the beta. We prune at a node if the condition that alpha < beta fails.

If you look back at our traversal of the tree, you may observe that a node may pass on alpha or beta to its child nodes, but not to its parent. The parent would determine its respective limit based on the available options.

So let’s recap using this new terminology:

Max at depth 2 observes 5 and sets alpha = 5
Max observes 6 and updates alpha = 6
Max selects 6.
Min observes available 6 and sets beta = 6
Min passes beta to second Max.
Max observes 7 and sets alpha = 7
alpha > beta and thus we prune.

Now, continuing this on our game tree,

Max at Root observes available 6 and sets alpha = 6
Max passes alpha to Min at depth 1
Min passes alpha to Max at depth 2

Let's assume all child nodes of this Max node are scored below a 6. So they all fall outside of our range. So picking any of them would not affect the value at the root node. We might as well just assume this Max node returns 6 (alpha) because that's the best possible value this node can return, since no nodes higher than 6 were found.

image

Min at depth 1 observes this available 6 and sets beta = 6. This means this min will now either return a value of 6 or lower. See how the result of this node now becomes irrelevant to the Max node at root. In other words, now alpha = beta. That is our condition of "alpha < beta" fails, and thus we prune all other child nodes of this min node and assume it returns 6.

Thus value at root node is 6, and we arrived at this result without having to check all the nodes. One thing to keep in mind though is that in certain places we assumed the alpha or the beta was the highest possible. So we should remember to choose the path that actually leads to a 6, rather than a path that we might lead to a 6. Simple way is to choose the move that led us to update our alpha or beta.

Challenge

For a given tic tac toe final position you have to determine which player won. write a program for this task.

Please sign in or sign up to submit answers.

Alternatively, you can try out Learneroo before signing up.

Previous Next Node