Then, yesterday, Matthew Wegner debuted live on twitch a real, working robot that plays Threes on a real iPad. In no way has the actual game of Threes on the iPad been hacked, and the A.I. has no information that a human doesn't have or couldn't keep track of. Already, this robot, dubbed Threesus by the Twitch chat room, has achieved a whopping high score of 248,403 -- and I'm confident that it can score much, much higher than that!
The underlying A.I. logic code of Matthew's robot is based on my original A.I. code. This morning, in an effort to encourage collaboration and to create the best Threes-playing robot possible, I released the source code for the A.I. logic on Github. If you're interested in improving Threesus, then please check it out!
The rest of this post will talk about how the A.I. actually "thinks" and chooses what move to make next. I'll try to keep it high-level so that non-coders can understand, but if you're a programmer then I encourage you to take a look at the actual source code.
First of all, you should understand how Threes works under the hood -- specifically, how the game chooses what cards to add to the board and where. The best description I know of (and the one I used while writing Threesus) is this writeup on the Touch Arcade forum. Knowing how the game works under the hood isn't just useful for understanding the A.I. -- it will make you a better player too!
Fundamentally, Threesus works by looking 6 moves ahead into the future and then evaluating the potential future game board to determine its "quality". (I'll get to what "quality" means in a bit.) It does this look-ahead for every possible move in addition to many of the possible locations and new values of cards that might be added to the game board. Multiply all of these possibilities together, and Threesus will often examine millions of possible game states to decide just a single swipe. Once Threesus has calculated the quality of all of those game states, it simply swipes in the direction that is most likely to produce the highest-quality game state. (For you game-theory types, this is basically the expectimax algorithm.)
In addition to examining every possible move, Threesus also "counts cards" to determine the probabilities of what value a new card might be, just like how a blackjack player can count cards to beat a casino. Threesus knows that the game starts with a deck of 12 cards (4 ones, 4 twos, and 4 threes) and draws cards from the deck until the deck is empty, at which point the deck is re-created and re-shuffled with another 12 cards. So as the number of cards in the deck decreases, Threesus can better predict, often with very high accuracy, what new cards will be well into the future. This is why sometimes Threesus seems to know the future!
So how is the "quality" of a future game state calculated? Well, the naive way to evaluate the quality of the board is to simply count up the number of points on the board and try to maximize that. This works okay (in my tests it can score a little over 30,000 points on average), but it's actually far from the best way to evaluate a potential board. The problem with evaluating a board using only its score is that then Threesus will always be optimizing the game assuming that it will end in 6 turns, which is usually not the case!
A much better way to evaluate the quality of a future game board is to calculate a quality number that represents how much longevity a board has -- that is, for how much longer Threesus is likely to be able to play the game (after the six-move look-ahead) before finally losing.
Currently, Threesus attempts to calculate the longevity of a board using a fairly simple scoring calculation. The scoring calculation used in yesterday's live streams worked like this:
- Every empty space is worth 2 points.
- Every matching pair of adjacent cards is worth 2 points.
- A card next to another card twice its value is worth 1 point.
- A card trapped between two other cards of higher value, or between a wall and a card of higher value, is penalized 1 point.
That's it! This simple calculation was able to achieve a high score of 248,403, without ever actually trying to directly increase its score! And so if there's a single lesson to be learned from Threesus that you can apply yourself as you play Threes, it's this: Be patient. Don't play to increase your score; play to keep playing.
Running Threesus for 100 simulated games, here are the results it achieved:
100 games completed!
Total time: 02:13:08.2477053
Low Score: 10041
Median Score: 88653
High Score: 774996
% of games with at least a 384: 100%
% of games with at least a 768: 98%
% of games with at least a 1536: 92%
% of games with at least a 3072: 27%
% of games with at least a 6144: 3%
Since yesterday, Matthew has been working on tweaking the above scoring calculation, and already he's made significant improvements. Besides tweaking some of the point scores (including increasing the trapped penalty to 5), he also added bonus points if the highest-numbered card is on an edge or in a corner. With these improvements, Threesus is even better:
100 games completed!
Total time: 03:03:22.0262799
Low Score: 30126
Median Score: 89436
High Score: 717960
% of games with at least a 384: 100%
% of games with at least a 768: 100%
% of games with at least a 1536: 94%
% of games with at least a 3072: 41%
% of games with at least a 6144: 1%
(Don't read too much into the decrease in % of 6144. The difference between 1 game and 3 games is statistically insignificant.)
So in a nutshell, that's how Threesus works! But there's still a lot of room for improvement. Some ideas that come to mind are or have been suggested to me are:
- More refinements and testing of the board evaluator.
- Better handling of the end-game -- right now Threesus doesn't realize when he's almost certainly going to lose, and so he doesn't do anything to maximize score.
- Rewrite code in C or C++. The C# code that exists right now is very well optimized for C# code, but I think C or C++ could potentially be an order of magnitude faster, which should allow Threesus to look at least one more move ahead into the future in the same amount of time.
- Using a Monte Carlo algorithm to look ahead further without exploding computation times.
- Using a traditional reinforcement-learning method such as Q-Learning.
So please, if you think you can make Threesus even better, download the source code now!
Oh man I am so intrigued by the Threes robot! I'm super bummed I missed its debut on Twitch, but I hope you guys will keep streaming it? And I'm super excited about checking out the source code! I am just a noob programmer, I only started teaching myself Python a few months ago, BUT, I really enjoy it and this seems like the perfect thing to look into! Because it's interesting! And Threes is so much fun :)
ReplyDeleteInteresting
ReplyDelete"just like how a blackjack player can count cards to beat a casino" it's totally true. I read an interesting article about it https://casinority.com/card-counting-blackjack-strategy/ It is really interesting to try
ReplyDelete