Minimax

In the last two weeks, I have been working on implementing Minimax to program an unbeatable player for my TicTacToe game. Minimax is the algorithm that enables the computer to look ahead and find the optimal move. From the computer’s perspective, the computer is the maximizing player. Its objective is to maximize its gain (represented by a score). Because TicTacToe is a two player, zero-sum game, the opponent is the minimizing player. The objective of the opponent is to minimize the computer’s gain.

The implementation of Minimax heavily involves recursion. When it’s the computer’s turn, given a board:

  1. The algorithm check if there is a win, lose or tie.
    • If there is a win for the computer, return +10.
    • If there is a win for the opponent, return -10.
    • If there is a tie, return 0.
  2. If the game is still in progress, simulate a move at each of the empty cells.
  3. Score each cell, comepare them and return the best score.

Step (2) and (3) are where recursion occurs. The algorithm goes back and forth between simulating the computer’s move and its opponent’s move to play out all possible game scenarios. In doing so, it also switches between picking the max score or the min score depending on whose turn it is. The result, therefore, can only be either a win or a tie for the computer.

In implementing Minimax, where recursion can get very overwhelming very fast, it helps to separate the decision making of an unbeatable computer player into two parts. At the top most level, when it’s the computer’s turn, it has to make a move by calling a function that returns a cell number. In doing so, it utilizes minimax. Behind the scene, Minimax evaluates different scenarios by returning a score for each of them. It makes the recursive calls, flipping between players, between min and max, and pass the best score up to the top level. I decided to make Minimax a class because it needs information about the board, the rules, and the symbols of both players. To me, that’s complicated enough to make it into something that stands alone.

A. Return best move function

  • The computer loops through every empty cell on the board, get the score for each of them by calling Minimax.
  • The computer saves each score and its corresponding cell as a key value pair in a map.
  • After simulating all empty cells, it searches the map for the cell with the maximum score.
public class UnbeatableComputerPlayer{
    public int obtainValidCellSelection() {
        int CELL_OFFSET = 1;
        HashMap<Integer, Integer> scoreAndCell = new HashMap<>();
        for (int index = 0; index < board.getBoardSize(); index++) {
            if (isEmptyCell(board, index)) {
                board.insertSymbol(selfSymbol, index);
                Minimax algorithm = new Minimax(this.rule, this.board, this.selfSymbol, this.opponentSymbol);
                int score = algorithm.scoreACell(board, false, 0);
                scoreAndCell.put(score, index);
                board.resetCell(index);
            }
        }
        Set allKeys = scoreAndCell.keySet();
        ArrayList<Integer> listOfKeys = new ArrayList<>(allKeys);
        int maxScore = Collections.max(listOfKeys);
        int cell = scoreAndCell.get(maxScore);
        return cell + CELL_OFFSET;
    }
}

B. Minimax

In order to score each of the game end situations, Minimax has to have a few information before it can calculate anything. It needs to know what a win looks like, what a draw looks like, what an empty cells looks like.

public class Minimax {
    private boolean thereIsAWinner() {
        String winner = this.rules.checkForWinner(this.board);
        return !winner.equals("");
    }

    private boolean isADraw() {
        return this.rules.endsInADraw(this.board);
    }

    private boolean isEmptyCell(Board board, int index) {
        return board.getSymbol(index).equals(" ");
    }
}

Now, the scoring. There are 3 base cases that can break the loop: a win, lose or a tie. For the recursive case, at each player’s turn, the computer has to calculate a score for each of the empty cell at that state of the board. When it’s the human’s turn, the optimal cell is one with the lowest score. When its the computer’s turn, the optimal cell is one with the highest score. A boolean can be used to tell whose turn it is. The boolean maximizingPlayersTurn is passed into the minimax function and the next time minimax calls itself, it switches the boolean to !maximizingPlayersTurn.

    private int getWinOrLoseScore() {
        if (rules.checkForWinner(board).equals(selfSymbol)) {
            return 10;
        }
        return -10;
    }


    public int scoreACell(Board board, boolean maximizingPlayersTurn) {
        //base cases
        if (thereIsAWinner()) {
            return getWinOrLoseScore();
        } else if (isADraw()) {
            return 0;
        } else {
            //recursive case
            ArrayList<Integer> scores = new ArrayList<>();
            String selfOrOpponent = maximizingPlayersTurn ? this.selfSymbol : this.opponentSymbol;

            for (int index = 0; index < board.getBoardSize(); index++) {
                if (isEmptyCell(board, index)) {
                    board.insertSymbol(selfOrOpponent, index);
                    int score = this.scoreACell(board, !maximizingPlayersTurn);
                    scores.add(score);
                    board.resetCell(index);
                }
            }
            int minOrMaxScore = maximizingPlayersTurn ? Collections.max(scores) : Collections.min(scores);
            return minOrMaxScore; 
        }
    }     

Recursion packs so much action in such short amount of code. When things work it’s great but when something goes wrong, debugging can be very hard. At first, my minimax algorithm somehow instructs the player to always take the last cell on the board, which was ironically funny because the easy version of my computer player always takes the first available cell on the board. I tried to figure out my problem using the debugger to step through the code line by line, put breakpoints and step into a function call to find out the values of variables as the code executes.

Earlier on I’ve passed Rules into my Minimax constructor because the algorithm needs to know the rules of the game. I wrote the Rules class over a month ago and back then I had kept various states in this class.

    public class RulesFor3x3 implements Rules {
        private boolean gameInProgress = true;
        private boolean hasWinner = false;
        private boolean isADraw = false;
        private String winner;
    }

Why? Because I didn’t know any better. Up until this point, this Rules class has not given me any trouble. Now I realize that it’s not a wise thing to do, Rules simply should not store any of these game states. In the case of my Minimax implementation, after the algorithm simulates one end game scenario, all these states were set to different values. Therefore, the next couple rounds of simulation received the wrong information from Rules. Everything collapses from there.

That was one bug. After fixing this, my player gets a little smarter but still not unbeatable. It can only block a win and make a win, but doesn’t plan ahead. At this point, the debugger didn’t help me much. Looking back, I now know that unlike the first bug, this second bug occurred much later in the code execution. I lost track of things after going back and forth through the recursive loop so many times. Matt, a resident apprentice, advised me to include depth in my minimax function and log out everything (whose turn it is, the depth,the board, the scores, etc…) each time the function is called. Although depth is not a must at this stage, this way I can see how deep the recursion goes, and whether it indeed goes deep enough. By doing this, I found out the problem was in one of my base case. The function to determine when there is a draw didn’t work properly. Another bug originated from the Rules. I might very well have created this bug while fixing the first one.

After the debugging adventure, my unbeatable player seems a lot smarter now. Is it truly unbeatable? I have to write a few more tests before I can answer that question.