Solving Blackjack in Polynomial Time
I finished watching all the video lectures for MIT 6006. It's a very good course for anyone looking for a refresher on algorithms. In honor of the final unit (Dynamic Programming) I implemented a program to determine the optimal plays for a deck of Blackjack. First feel free to play with it a bit, and then I'll go over some details about how it's implemented.
The optimal plays for the deck are calculated on the fly and it takes just milliseconds. If the memoization (dynamic programming) lines are commented out, the program runs for several minutes.
(Code is on Github.)
Step 1: Class taxonomy
I settled on all these objects:
- BlackJack: this class holds the game logic - tests for busts, legal moves, handles the dealer's deterministic logic, deals cards, and asks the player for their move.
- Player: this class holds the player's winnings (or losses), the current cards dealt to the player, and has a GetStrategy(...) method to decide the next move.
- Deck: a collection of individual cards that can be sorted or dealt.
- Solver: a subclass of Blackjack that tries all possible games to determine the best solution.
This taxonomy worked well for the first step of this project, which was to implement a BlackJack "player" to eat through a deck using "by the book" strategy, taking into account the visible dealer card and the player's hand.
Unfortunately, this taxonomy became a liability when it came to implementing the second step, which was the solver. In an ideal world, you could implement a recursive solver taking a single argument which is the state of the game. However, objects are always passed by reference in (as far as I know) all object oriented languages so this destroys state in the lower levels of recursion as the higher levels do their destructive calculations. Properties of the objects that mattered to the solver - for instance the current state of the deck and the hands of the player and dealer - could not be passed as objects and needed to be independent parameters of the solver function in order to retain the information at the lower recursion levels. This experience solidified in my mind the "clash in parity" of object oriented vs functional paradigms. The functional / immutable data paradigm worked great for the solver and the object-oriented / mutable data paradigm worked great for running through a single game, but doing both in a single paradigm was awkward and the number of times I called .slice() is a testament to that.
Step 2: Shuffling the deck
Shuffling a deck of cards is not a difficult problem but there are a lot of ways to do it wrong. The (Knuth) Fisher–Yates shuffle is an easy algorithm that does it correctly. It's basically a special case of reservoir sampling.
Step 3: Exponential solution
For any game state provided the current Dealer hand, the current Player hand, and the current deck, calculate recursively the result of hitting vs staying, and return the max.
Step 4: Memoize it
As all possible states are traversed, identical substates will be arrived at so all optimal substates should be memoized for later use. This means the cost of recursion (amoritized) is O(1). This is what allows us to solve an inherently exponential problem in polynomial time.
I made a mistake here in my original implementation because the suffix of the deck is only interesting if not only the position in the deck is identical but also the dealer and player hands are also identical. To put it another way, you can only memoize the result at the beginning of the hand, and also you can only pull out the memoized result at the beginning of the hand. If the deck matches a memoized result but you're in the middle of a hand the value is not useful, where in a lot of other similar dynamic programming problems you don't need to think about that.
Step 5: Check your work
Finally I created a "subclass" of player that provides results to call to GetStrategy(...) in with a pre-computed playbook. This is what is displayed in the browser when you click "play optimal". Composing that player into the BlackJack object with the parent pointers from the dynamic programming solver results in the output you can see.
You can check out the full code on Github.