I've been working on the main singleplayer (& co-op) "explorable universe" mode for StarWright. This mode is still a ways off, but in the meantime I'd like to share something with you that I think is pretty neat: the "random starship generator" that the singleplayer mode will use to generate an unlimited variety of enemy starships for you to fight. (You can play with these procedurally-generated starships in the "sandbox" mode of the latest version of StarWright.)
Here's a screenshot of just one of the practically-unlimited number of starships that the game can generate:
This is a fully-functional and heavily-armed starship, complete with thrusters, weapons, a cockpit, a reactor, ammo storage, power storage, and crew's quarters.
On the left you can see (click on the screenshot to get a bigger view) a user interface that gives a great amount of power and control over the underlying algorithm that generates the starships. I've included it in these screenshots to help you understand the underlying algorithm.
At a high level, the algorithm works by creating a starship in "stages", which you can see listed on the left with names like "Shape", "Fill Gaps", and "Cockpit". The algorithm starts at the top of the list and processes each stage in sequence. Each stage adds to what was generated by the previous stages, except for of course the first stage which starts from scratch.
Let's walk through each stage in sequence so we can see how the ship gets created stage-by-stage:
Stage 1: "Shape"
In the above screenshot, I've turned off all of the stages except for the very first: the "Shape" stage. On the right you see what the ship looks like (in this case a rough outline of a ship composed entirely of empty "corridor" tiles), and on the left you can see that I've expanded the user interface for this "Shape" stage to give a better view of the inner-workings of this stage.
The sole purpose of this stage is to give a rough outline to the general shape of the ship. Most subsequent stages will obey the shape given by this stage and only place additional parts ("part" being my technical term for discrete rooms, systems, and corridor/armor tiles) within this outline.
The "Shape" stage is actually unlike all the later stages in that it itself is comprised of some number of "shape modules". You can almost think of a shape module as a "brush" from a paint program such as Photoshop in that it defines a "shape" of 1x1 corridor tiles to add to the ship. The two shape modules you can see here are "CircleModule" (which places a blocky circle-like shape of tiles) and "RectangleModule" (which places a rectangular area of tiles).
The "Shape" stage adds somewhere between "MinModules" and "MaxModules" number of random modules (in this case it's always 8 modules), where the chance of any given module being added is determined by its "RandomWeight". Once the modules are chosen, the stage then joins them together in a random configuration and "paints" the resulting shape as corridor tiles.
Stage 2: "Fill Gaps"
Notice that the result of the first stage left some small gaps and rough edges in the ship. The purpose of this second stage is to fill in those gaps and "even out" the rough edges to give the ship a more solid appearance. While the desire to have ships appear more solid is largely an aesthetic preference (it's perfectly legal to have a ship with holes in it), it does make it easier for the later stages to place new rooms and systems inside the ship. For other styles of ships I may choose to leave out this stage entirely, or even replace it with a stage that carves away bits of the ship in order to make it appear more "stick-like".
The actual logic behind this stage is pretty simple: It simply looks at each empty tile, checks to see if it's surrounded on 2 or more sides by tiles no more than "ScanRange" units away, and if so, fills that empty tile in with another corridor tile. Then it repeats itself (up to "Iterations" number of times) to fill in any more tiles that are newly-eligible to be filled.
Stages 3 & 4: "Cockpit" and "Reactor"
The two most important rooms/systems on any starship are the cockpit and the reactor, so we add those before we add any others. While both the cockpit and the reactor have their own stages, the actual logic of the stages are very similar, so I'll talk about them both at once.
Each of these stages works by iterating through every possible location for the cockpit or reactor and picking the single "best" location for it.
The only difference between the Cockpit and the Reactor stages are in how they define "best". You can see in the user interface that each stage defines a "PlacementStrategy" that is used to determine the best location.
The cockpit uses a "Protected" placement strategy, which simply means that the "best" location for the cockpit is the one that is farthest from the perimeter of the ship. This makes sense since the actual location of the cockpit doesn't affect how well the ship functions, and so we just want to place it in the most difficult-to-destroy location possible.
The reactor uses a "Centralized" placement strategy, which simply means that the reactor will be placed as close to the ship's center of mass as possible. This makes sense since the reactor is most effective when it's in a relatively central and accessible location.
As you can see here, the "best" locations for the cockpit and reactor are often very close or even overlapping. In that case, the cockpit's location preference gets priority simply by virtue of being before the reactor in the list of stages. This explains why the reactor in the above screenshot is somewhat offset from the ship's actual center of mass -- the reactor couldn't be placed in the ideal location because the cockpit was already there, and so it picked the next-best location.
Stages 5 & 6: "Cockpit Armor" and "Reactor Armor":
Since the cockpit and reactor are so important, its usually a good idea to surround them with added protection in the form of armor tiles. These two stages do just that by each finding a part of a specific type and then surrounding that part with one or more layers of armor tiles.
But as you can see in the above screenshot, these stages will usually leave gaps and paths around the parts that they are protecting so that access by the crew to any room or section of the ship isn't cut off. The logic that handles this is crucial to how this and almost every other stage works, so let's talk about...
Path Contiguity Checking
Before any stage picks a location for a new part, it first does what I call a "path contiguity check", which essentially asks the question, "If I place a part here, will I be forcing crew to go too far out of their way to get around me?"
To perform this check, the stage picks any open tile adjacent to the proposed location, and does a quick Dijkstra search to each of the other adjacent tiles. As long as the search finds a path to each other tile no longer than, for example, 5 tiles, then the path contiguity check it considered a success and the part is allowed to be placed there. But if the path fails, then adding a new part to that location would either force the crew to go too far out of their way or would cut off a path entirely, and so the stage will exclude that possible location when deciding where a new part should go.
We can see the practical implications of this check in the above screenshot for stages 5 & 6. Look at, for example, the gap in the armor directly underneath the cockpit where the cockpit's door is located. The "Cockpit Armor" stage didn't place an armor tile there because doing so would have cut off access from the surrounding corridor tiles to the cockpit itself. And the corridor running between the cockpit and the reactor exists because adding an armor tile there would have forced crew to walk all the way around both the cockpit and the reactor to get to the other side.
Stage 7: "Thrusters"
After the Cockpit and Reactor are placed, along with their surrounding armor, the thrusters are placed. There are three sizes of thrusters (small, medium, and big), and this stage is responsible for placing all three.
If you take a look at the U.I. for this stage above, you can see that each size of thruster has a different "score" (1 for small, 2 for medium, and 4 for large), and the stage as a whole can place between 35-45 points worth of thrusters. In addition each size of thruster has a different random "weight", meaning that about 1/6 of the thrusters will be small, 2/6 will be medium, and 3/6 will be large.
The directions of the thrusters are also weighted randomly, so that 5/8 are placed pointing backwards and 1/8 are pointing in each of the other three directions.
Aside from those random weights, there is currently no additional logic (besides randomness) to determine where the thrusters are placed. Even so, randomly-generated ships usually have reasonably well-balanced thruster placements simply by virtue of random probabilities.
Stage 8: "Weapons"
The placement of weapons works almost identically to the placement of thrusters, but with different sizes, scores, and random distributions. (In this case, there are only two sizes of weapons, and their rotations are distributed more evenly around the ship while still favoring weapons that face forward.)
The one algorithmic difference between thrusters and weapons is that, when choosing a location for a weapon, the algorithm will choose the location that is farthest away from the center of the ship in the direction that the weapon is facing.
Stage 9: "Exterior Armor"
Once all the exterior systems (weapons and armor) are placed, a single layer of armor is placed around the perimeter of the ship on any tiles that are already occupied by a weapon or thruster, so long as placing an armor tile there will not block and crew paths according to the above "Path Contiguity Checking" algorithm.
Stages 10 & 11: "Ammo Supplies" and "Power Storage"
Next, Ammo Supply and Power Storage parts are placed. In this case, between 2-3 ammo supplies and 1-2 power supplies are added to the ship.
The algorithm that places these parts attempts to place them in locations that are nearby as many ammo-consuming or power-consuming parts as possible. In the case of ammo supplies, the algorithm will attempt to place them near as many weapons as possible, so long as those weapons aren't already near other ammo supplies. And in the case of power supplies, it places them near thrusters and ammo supplies (which use power to create ammo), so long as those thrusters and ammo supplies aren't near other power supplies or the reactor.
Stage 12: "Crew"
Finally, crew's quarters and bunk rooms are placed in the remaining empty areas. Their placement is completely random, except that the algorithm tries to line up the edges of the rooms if possible, which tends to cause crew's quarters and bunks to be placed adjacent to each other.
Stage 13: "Fill With Armor"
Once all of the crucial rooms and systems have been placed, the almost-final step is to fill any remaining space with armor tiles, because having armor throughout a ship makes it generally harder to destroy.
In order accomplish this, every single empty tile is scanned using the above "Path Contiguity Checking" algorithm to determine whether placing armor there will block any crew paths. If no crew paths will be blocked, then an armor tile is placed.
The neat side-effect of placing armor in this way is that corridors, like the ones you see above, are naturally "created" as a simple by-product not placing armor in locations that would block crew paths!
Stage 14: "Add Extra Doors"
The one final-final step is to place any extra doors in locations that would make crew access significantly more convenient. This is accomplished by scanning every wall location where a door could be placed (but isn't already there) and doing a path search from one side of the wall to the other. If the resulting path is over a certain distance (in this case 10 tiles), then a door is placed to make the path shorter.
In the above screenshot, I can only spot one additional door that was added -- in this case, on the right side of the medium-sized thruster on the bottom of the ship.
Friday, April 18, 2014
Saturday, April 5, 2014
Threesus!
A couple months back I wrote an "A.I" computer program that knows how to play the brilliant iOS/Android game Threes -- and play it much better than most humans can! While I was happy with the quality of the A.I., the problem was that it used a text-based interface in which a human had to type in the state of the board, wait for a couple seconds for the A.I. to think, and then swipe in the direction that the A.I. tells the human to swipe. Repeat hundreds of times, and the process of playing even a single game can take several hours. It was incredibly tedious, and I myself only had the patience to play a couple of games with the A.I., getting a high score of 178,926.
Then, yesterday, Matthew Wegner debuted live on twitch a real, working robot that plays Threes on a real iPad. In no way has the actual game of Threes on the iPad been hacked, and the A.I. has no information that a human doesn't have or couldn't keep track of. Already, this robot, dubbed Threesus by the Twitch chat room, has achieved a whopping high score of 248,403 -- and I'm confident that it can score much, much higher than that!
The underlying A.I. logic code of Matthew's robot is based on my original A.I. code. This morning, in an effort to encourage collaboration and to create the best Threes-playing robot possible, I released the source code for the A.I. logic on Github. If you're interested in improving Threesus, then please check it out!
The rest of this post will talk about how the A.I. actually "thinks" and chooses what move to make next. I'll try to keep it high-level so that non-coders can understand, but if you're a programmer then I encourage you to take a look at the actual source code.
First of all, you should understand how Threes works under the hood -- specifically, how the game chooses what cards to add to the board and where. The best description I know of (and the one I used while writing Threesus) is this writeup on the Touch Arcade forum. Knowing how the game works under the hood isn't just useful for understanding the A.I. -- it will make you a better player too!
Fundamentally, Threesus works by looking 6 moves ahead into the future and then evaluating the potential future game board to determine its "quality". (I'll get to what "quality" means in a bit.) It does this look-ahead for every possible move in addition to many of the possible locations and new values of cards that might be added to the game board. Multiply all of these possibilities together, and Threesus will often examine millions of possible game states to decide just a single swipe. Once Threesus has calculated the quality of all of those game states, it simply swipes in the direction that is most likely to produce the highest-quality game state. (For you game-theory types, this is basically the expectimax algorithm.)
In addition to examining every possible move, Threesus also "counts cards" to determine the probabilities of what value a new card might be, just like how a blackjack player can count cards to beat a casino. Threesus knows that the game starts with a deck of 12 cards (4 ones, 4 twos, and 4 threes) and draws cards from the deck until the deck is empty, at which point the deck is re-created and re-shuffled with another 12 cards. So as the number of cards in the deck decreases, Threesus can better predict, often with very high accuracy, what new cards will be well into the future. This is why sometimes Threesus seems to know the future!
So how is the "quality" of a future game state calculated? Well, the naive way to evaluate the quality of the board is to simply count up the number of points on the board and try to maximize that. This works okay (in my tests it can score a little over 30,000 points on average), but it's actually far from the best way to evaluate a potential board. The problem with evaluating a board using only its score is that then Threesus will always be optimizing the game assuming that it will end in 6 turns, which is usually not the case!
A much better way to evaluate the quality of a future game board is to calculate a quality number that represents how much longevity a board has -- that is, for how much longer Threesus is likely to be able to play the game (after the six-move look-ahead) before finally losing.
Currently, Threesus attempts to calculate the longevity of a board using a fairly simple scoring calculation. The scoring calculation used in yesterday's live streams worked like this:
Then, yesterday, Matthew Wegner debuted live on twitch a real, working robot that plays Threes on a real iPad. In no way has the actual game of Threes on the iPad been hacked, and the A.I. has no information that a human doesn't have or couldn't keep track of. Already, this robot, dubbed Threesus by the Twitch chat room, has achieved a whopping high score of 248,403 -- and I'm confident that it can score much, much higher than that!
The underlying A.I. logic code of Matthew's robot is based on my original A.I. code. This morning, in an effort to encourage collaboration and to create the best Threes-playing robot possible, I released the source code for the A.I. logic on Github. If you're interested in improving Threesus, then please check it out!
The rest of this post will talk about how the A.I. actually "thinks" and chooses what move to make next. I'll try to keep it high-level so that non-coders can understand, but if you're a programmer then I encourage you to take a look at the actual source code.
First of all, you should understand how Threes works under the hood -- specifically, how the game chooses what cards to add to the board and where. The best description I know of (and the one I used while writing Threesus) is this writeup on the Touch Arcade forum. Knowing how the game works under the hood isn't just useful for understanding the A.I. -- it will make you a better player too!
Fundamentally, Threesus works by looking 6 moves ahead into the future and then evaluating the potential future game board to determine its "quality". (I'll get to what "quality" means in a bit.) It does this look-ahead for every possible move in addition to many of the possible locations and new values of cards that might be added to the game board. Multiply all of these possibilities together, and Threesus will often examine millions of possible game states to decide just a single swipe. Once Threesus has calculated the quality of all of those game states, it simply swipes in the direction that is most likely to produce the highest-quality game state. (For you game-theory types, this is basically the expectimax algorithm.)
In addition to examining every possible move, Threesus also "counts cards" to determine the probabilities of what value a new card might be, just like how a blackjack player can count cards to beat a casino. Threesus knows that the game starts with a deck of 12 cards (4 ones, 4 twos, and 4 threes) and draws cards from the deck until the deck is empty, at which point the deck is re-created and re-shuffled with another 12 cards. So as the number of cards in the deck decreases, Threesus can better predict, often with very high accuracy, what new cards will be well into the future. This is why sometimes Threesus seems to know the future!
So how is the "quality" of a future game state calculated? Well, the naive way to evaluate the quality of the board is to simply count up the number of points on the board and try to maximize that. This works okay (in my tests it can score a little over 30,000 points on average), but it's actually far from the best way to evaluate a potential board. The problem with evaluating a board using only its score is that then Threesus will always be optimizing the game assuming that it will end in 6 turns, which is usually not the case!
A much better way to evaluate the quality of a future game board is to calculate a quality number that represents how much longevity a board has -- that is, for how much longer Threesus is likely to be able to play the game (after the six-move look-ahead) before finally losing.
Currently, Threesus attempts to calculate the longevity of a board using a fairly simple scoring calculation. The scoring calculation used in yesterday's live streams worked like this:
- Every empty space is worth 2 points.
- Every matching pair of adjacent cards is worth 2 points.
- A card next to another card twice its value is worth 1 point.
- A card trapped between two other cards of higher value, or between a wall and a card of higher value, is penalized 1 point.
That's it! This simple calculation was able to achieve a high score of 248,403, without ever actually trying to directly increase its score! And so if there's a single lesson to be learned from Threesus that you can apply yourself as you play Threes, it's this: Be patient. Don't play to increase your score; play to keep playing.
Running Threesus for 100 simulated games, here are the results it achieved:
100 games completed!
Total time: 02:13:08.2477053
Low Score: 10041
Median Score: 88653
High Score: 774996
% of games with at least a 384: 100%
% of games with at least a 768: 98%
% of games with at least a 1536: 92%
% of games with at least a 3072: 27%
% of games with at least a 6144: 3%
Since yesterday, Matthew has been working on tweaking the above scoring calculation, and already he's made significant improvements. Besides tweaking some of the point scores (including increasing the trapped penalty to 5), he also added bonus points if the highest-numbered card is on an edge or in a corner. With these improvements, Threesus is even better:
100 games completed!
Total time: 03:03:22.0262799
Low Score: 30126
Median Score: 89436
High Score: 717960
% of games with at least a 384: 100%
% of games with at least a 768: 100%
% of games with at least a 1536: 94%
% of games with at least a 3072: 41%
% of games with at least a 6144: 1%
(Don't read too much into the decrease in % of 6144. The difference between 1 game and 3 games is statistically insignificant.)
So in a nutshell, that's how Threesus works! But there's still a lot of room for improvement. Some ideas that come to mind are or have been suggested to me are:
- More refinements and testing of the board evaluator.
- Better handling of the end-game -- right now Threesus doesn't realize when he's almost certainly going to lose, and so he doesn't do anything to maximize score.
- Rewrite code in C or C++. The C# code that exists right now is very well optimized for C# code, but I think C or C++ could potentially be an order of magnitude faster, which should allow Threesus to look at least one more move ahead into the future in the same amount of time.
- Using a Monte Carlo algorithm to look ahead further without exploding computation times.
- Using a traditional reinforcement-learning method such as Q-Learning.
So please, if you think you can make Threesus even better, download the source code now!
Subscribe to:
Posts (Atom)