How Supercomputers Will Transform The Way We Learn PokerWill Artificial Intelligence Research Uncover “Perfect” Play?by Erik Fast | Published: Apr 01, 2015 |
|
Serious poker study is no longer taking place in just casinos and online poker rooms.
Over the past year, a number of interesting and exciting advancements in poker playing computer programs have grabbed mainstream headlines.
Top computer science programs are running supercomputers that play billions of poker hands a second. Their work, which focuses on studying imperfect information games, such as poker, could have much larger research implications related to the application of artificial intelligence (AI) techniques for solving larger, more complex problems.
Ever since chess computer Deep Blue beat human Grandmaster Garry Kasparov in 1997, many have wondered how long it would be before bots were developed that could beat poker.
In 2015, a monumental step forward in making that claim was announced when the University of Alberta Computer Poker Research Group (UACPRG) published an article in Science titled “Heads-up Limit Hold’em Poker is Solved.” What does this advancement mean for the average poker player? How is it going to change the game, the process by which players eventually learn the game, and could poker games and formats also eventually be solved?
In this article Card Player will answer these questions and investigate other key questions that arise as a result of this news.
Heads-up Limit Hold’em Poker is Solved
“Here, we announce that heads-up limit Texas hold’em is now essentially weakly solved,” says the UACPRG’s abstract. What does that mean?
Neil Burch, a PhD student at University of Alberta, member of the UACPRG, and co-author of the Science article told Card Player that, “Solving the game means that we have found a Nash equilibrium: a pair of strategies that maximize their winnings against each other. It’s a fixed strategy, so it doesn’t adapt to the opponent, but it’s a very special fixed strategy. “
This effectively would mean that they have found a game-theoretically optimal solution to heads-up limit hold’em in which the computer program will be at worst break even and most likely win over a sufficient sample size.
For all intents and purposes, they have created a world-beating program called “Cepheus” that works by searching a self-created database of game situations to find the most optimal move for any given spot.
“Heads-up limit hold’em is essentially solved,” continues Burch, “Because Cepheus is a sufficiently good approximation of a Nash equilibrium. We can measure how good the approximation is: if you knew Cepheus’ strategy and were a perfect player, you could make a little bit less than 0.001 big blinds per hand on average playing against Cepheus. Why do we say this is sufficiently small? Because a single hand of poker is noisy—some hands should be big wins and some hands should be small losses —a human lifetime of play is not enough hands that you’re likely to be able tell the difference between that 0.001 big blinds per hand hole and an exact solution. Previous strong bots don’t have this guarantee on the maximum win rate against a perfect opponent. They might have been good enough to beat humans, or break even against other strong bots, but there was always the possibility that if you played just right, you would win at a noticeable rate.”
To create Cepheus the UACPRG team used 200 computers, running for 70 days straight to generate the database, which is more than 11 terabytes in size. The program, when faced with a decision point, rapidly assesses each possible response it could take and what its database says about its efficacy. It does not adjust on the fly to individual opponents, but instead has a static approach and plays in a manner that is unexploitable no matter what strategy its opponent adopts.
Why has a top computer science program dedicated so much time, more than a decade in fact, to working to solve heads-up limit hold’em in particular?
Computer scientists have been testing their game-playing creations against human players for decades now, but poker is different to games like chess in many ways. One of the most pronounced differences is that it is an imperfect information game, or one in which the players all have exclusive information (hole cards) in addition to the shared info of the board cards. This fact that has made poker a particular target of computer scientists looking to test computers’ abilities to solve problems with just these types of uncertainties.
When bending a computer’s processing power to solving the game, one of the largest considerations is the number of total positions possible within the game, or the ‘size’ of the game. Tic-tac-toe is 10^3 in size (ten to the order of three, or 1,000), and is solved. Limit hold’em is 10^14 (100 trillion), surprisingly smaller in size than even checkers (10^20), but not necessarily easier for it.
“Although smaller than checkers, the imperfect-information nature of HULHE [heads-up limit hold’em] makes it a far more challenging game for computers to play or solve,” says the UACPRG in Science.
So now that the UACPRG has solved this game, what does that mean for your average poker player?
Should I Worry About Running Into These Computer Programs Online?
Right now many readers might be saying to themselves, “Does this mean that sometime soon I could be playing poker against an unbeatable bot online without realizing it?”
Probably not. Even if, by a longshot, a computer program did make it to an online site (and sites use extensive tools to detect bots and remove them well before they even leave development stages) any bots they come up against are going to be far less sophisticated than the likes of Cepheus.
“Even if there was a bot playing, it wouldn’t have a solution to the game. It would probably be just be some programmer that tried to write out a set of rules to follow. It won’t be anywhere near as competent,” said Tim Reiff, a former poker pro and bot hobbyist who created a program called “Prelude” that placed in two no-limit hold’em categories at the 2014 AAAI (Association for the Advancement of Artificial Intelligence) Annual Computer Poker Competition. “Personally, I have almost no worry about that and I don’t think other people should be worried either. The bots won’t be that good.”
Also, while limit hold’em with two players is now essentially solved, computer scientists are still a long way off from completely solving the more commonly spread game of no-limit hold’em, even when it is limited to heads-up play.
Tuomas Sandholm, a Professor at Carnegie Mellon University in the Computer Science Department, created a computer program called “Tartanian7” which won two categories of competition for heads-up no-limit at the AAAI competition, beating each opponent along the way with statistical significance. He also seems to think that players shouldn’t worry too much about facing off against bots on the virtual felt.
“The current ones that are out there now, some of them have participated in the AAAI competition and turned out to be not very good,” said Sandholm.
One of the reasons that the bots online aren’t very good is that they haven’t been made by top computer scientists and game theorists like Sandholm and the UACRPG, who have decades of combined experience and access to incredible amounts of computing power.
Sandholm’s bot was created in a similar manner to “Cepheus,” with the ultimate goal of creating an unbeatable heads-up no-limit hold’em program. No-limit hold’em is of course a dramatically more complex game due to the possible variation in bet sizing that doesn’t exist in limit hold’em.
“It’s currently unknown whether the best program, which is ours, is better than the top human professionals,” said Sandholm. “I’d conjecture that it is, but there has not been a controlled man vs. machine match yet.”
Just such a match took place in 2007, when poker pros Phil Laak and Ali Eslami took on one of the University of Alberta’s previous limit-hold’em heads-up bots, “Polaris.” The humans eeked out a win then, but an updated version of Polaris won a similar contest a year later.
Nearly eight years later, the UACRPG, with Professor Dr. Michael Bowling leading a team of other professors and PhD students, have finally reached the point that they are comfortable saying that their current limit hold’em computer program’s play is indistinguishable from perfect. It seems that the same level of mastery might still be some ways off for heads-up no-limit hold’em though, seeing as the game’s “size” of 10^140 (ten to the order of 140) dwarves that of limit hold’em.
A solution to no-limit hold’em, even with just two players, is still far off in the future. In addition, even these incredible bots are essentially useless as soon as there are three or more players at a table. The days where it’s possible to encounter a competent bot that can play no-limit hold’em in a game that is not heads-up is far, far away.
Will I Be Able To Use Bots To Learn How To Play More Optimally?
In the last decade, the approach to learning poker has changed in many ways and may change again with these advanced computer programs.
While most people used to simply learn through playing the game or perhaps reading a book or two on poker strategy, modern day players have access to far more sophisticated tools, including video instruction from top pros (where you can view hole cards and engage in hand analysis) and advanced statistical tools that analyze online poker play and spit out hard data such as PFR% (preflop raise percentage) and VPIP% (percentage of time you voluntarily put money in the pot.)
While being able to see the best human poker professionals at work is important when learning poker, Sandholm thinks that, in the future, all of the best players will learn the game from playing against computer programs like Tartanian7 and Cepheus (which you can do for free right now on the Cepheus Poker Project’s website http://poker-play.srv.ualberta.ca/.)
“This bot has so much to tell people about how to play poker that it’s ridiculous,” Sandholm told Card Player in regards to Tartanian7. “It plays poker very differently from how humans play poker. Humans learn from each other how humans play the game, not how it is optimally played. Tartanian7, in contrast, has never seen a human play poker. Instead, it has reasoned from first principles how poker should be played and the conclusions are different from what humans have reached. So there is a lot to learn from this bot.”
These computer programs have determined strategy purely from the rules of the game. It doesn’t base its play on any historical experience against humans or other bots. It’s really extrapolating how to best play the game from just the rules. With this distinctive approach bots have found a different way to play that could be very instructive for humans to learn from.
“People could play against the bot and learn from observation and practice,” suggests Sandolm, ”Or even play against it and be able to ask it for advice as far as what it would do if it where in their shoes, so we will be able to make a very cool training tool out of this bot.”
In their article in Science magazine the UACRPG details some of the strategic concepts they feel they have proven with Cepheus that most poker players already assumed as a given but didn’t necessarily have conclusive data to back up their claim. For example, they say that they now can state as a fact that their, “…computation formally proves the common wisdom that the dealer in the game holds a substantial advantage.”
The UACRPG’s Team Leader Bowling told tech website The Verge that the player on the button has an advantage of ‘88 millablinds (0.088 blinds) per hand.
When asked if he thought, in the future, if learning from what his program and others like it have discovered would be mandatory for all those wanting to compete at the top of the game, Sandholm didn’t mince words.
“I think so. It’s a bit of a nuclear weapon for poker. You don’t want to be bringing a knife to a gun fight.”
While Sandholm is very bullish on the possibility of learning from computer programs, Reiff seems to be a little bit unsure of how the likes of Cepheus can be used to teach human players.
“It’s tricky to learn from it. The comparison I would make is to a chess engine. You can play a game of chess and have a computer tell you what it thinks the best move would be at any given point, but it’s still hard to learn from that because you don’t know the reasoning it used or it’s unclear why you should do that,” says Reiff.
“You can go to the website and look up what Cepheus will do in any given spot. It will give you information like ‘If you have ace-jack you should bet the flop 6.4 percent of the time’ or something. It’s really difficult to take that information and utilize it. I think that the dedicated poker player should try to get some first hand experience with that technology…but it’s impossible for a human to memorize all of these percentages. It’s not currently a set of rules you should follow. It’s not heuristic.”
Reiff also expressed misgivings about these top-level bots and their non-adaptive strategies being the best approach for human players interested in maximizing their winning potential.
“The Cepheus bot is playing an equilibrium and is not trying to exploit its opponent, but instead is essentially playing out a mathematical equation, basically, that is unbeatable,” said Reiff. “In my opinion, trying to play that way as a human being might not even work. A human should probably focus on trying to exploit their opponent. Poker isn’t just about defeating your opponent but also maximizing your utility against any opponent.”
This seems to support the sentiment that David Sklansky expressed in an interview with Bloomberg, in which he said that the computer might beat a bad player but a poker pro like him would “destroy that beginner to a greater degree.”
Only time will tell just how useful these AI-crafted poker bots will end up being as training tools for human players.
What Are The Real World Implications Of These Advances In Artificial Intelligence?
While interesting in their own right, attempts by computer scientists to solve games like chess and poker are early benchmarks in a larger pursuit of more real-world applications of artificial intelligence algorithms. The complicated but controlled system of a poker game is a great framework in which to test the limits of algorithms, using poker as a stand-in for game-like problems in everyday human life.
“[Heads-up limit hold’em] is an example of a large class of imperfect information decision making problems,” says Burch. “Games make great example problems, because the rules are well described and scoring makes evaluation easier. While some of the research in the last decade plus was specific to poker, the large majority of it was on general techniques, applied to poker. The techniques and ideas that came out of the poker research can be useful elsewhere: Kit Chen and Mike Bowling looked at diabetes and managing blood glucose level, using one of the algorithms that came out of poker research to find robust medical testing and treatment plans that balance the most likely case (the “average person”) against outliers who may react poorly.”
Solving imperfect information games, like poker, has presented computer scientists with new problems when compared with perfect information games. Burch elaborates, “In chess, I don’t need to know how I got to a position to figure out the best move from there: the pieces tell me all I need to know. In poker, to find the best action I need to know for each hand how likely it was that my opponent would have played to the current position. So to find the correct early play, I need the correct later play, but I can’t find the correct later play without knowing what the early play was.”
Even though the UACRPG has solved limit hold’em, they and other researchers are still interested in poker as a subject.
“There are still interesting long-term problems in poker, especially for agents that learn… but poker is definitely a stepping point, not a stopping point,” says Burch. “What’s after poker? The work on robust policies for managing blood glucose levels can be expanded, both in scope and to different problems. Another problem is security games. Milind Tambe at USC has looked at many security problems, like scheduling coast guard patrols, as a game: we might be able to apply our work to these problems and bring something new. And hopefully something completely new, that we haven’t even thought of yet because we’re still tidying up loose ends in poker.”
So in the end, poker bots like Cepheus, Tartanian7, Prelude and others could not only result in a step forward for the game of poker, but indeed for society. ♠
Meet Poker’s AI Masters
There are several computer programs that crush poker. Seeing as they all have funny names, it might be hard to remember what separates a Polaris from a Prelude, so we decided to include some background info on each of the bots mentioned to help you keep them straight.
Cepheus (2015)
Plays: Heads-up limit hold’em
Created by: University of Alberta Computer Poker Research Group, lead by Dr. Michael Bowling
Credentials: First computer program to have “solved” an imperfect-information game, such as heads-up limit hold’em poker.
Creation: According to the bot’s website, “It was trained against itself, playing the equivalent of more than a billion hands of poker. With each hand it improved its play, refining itself closer and closer to the perfect solution. The program was trained for two months using more than four thousand CPUs each considering over six billion hands every second! This is more poker than has been played by the entire human race.“
Tartanian7 (2014)
Plays: Heads-up no-limit hold’em
Created by: Carnegie Mellon Computer Science Professor Tuomas Sandholm and his Ph.D students Noam Brown and Sam Ganzfried
Credentials: Won both competition categories for heads-up no-limit hold’em it entered at the Association for the Advancement of Artificial Intelligence (AAAI) Annual Computer Poker Competition. The first category was ‘total bankroll,’ in which computer programs play against each other and whichever bot wins the most virtual money wins. The other category won was ‘instant runoff,’ where all of the bots play each other and, after each round, the bot that loses the most money is dismissed until there is one bot remaining.
Creation: Sandholm said that the process of creating his program required, “Four Ph.D students and myself working on this bot since 2005. In the last year we’ve worked full time, with super computing time spent on the program being somewhere between 1 and 2 million core hours.”
Prelude (2014)
Plays: Heads-up no-limit hold’em
Created by: Tim Reiff
Credentials: Runner-up in no limit hold’em heads-up ‘instant runoff’ category and third place in ‘total bankroll.’ Reiff noted on his website that his showing as a lone hobbyist was, “…Not bad considering the winning team from Carnegie Mellon University used a supercomputer in Pittsburgh.”
Creation: Reiff says, “Prelude is a carefully pre-computed table of 25.5 billion separate probabilities that determine its decision in every possible situation. All of these probabilities were generated over the course of weeks on a powerful computing cluster running the Pure CFR algorithm. I spent a great deal of time researching and implementing several game solving techniques, and ran hundreds of experiments before choosing the final settings that were used to construct Prelude.”
Polaris (2007)
Plays: Heads-up limit hold’em
Created by: University of Alberta Computer Poker Research Group
Credentials: Polaris is best known as the first poker bot to take on top human professionals successfully. In the first man vs. machine match in the summer of 2007 Polaris played four rounds of duplicate against poker pros Ali Eslami and Phil Laak with 500 hands per round. In duplicate poker, whatever cards were dealt to the human in one matchup were then dealt to the bot in the other matchup. In the end, the bot drew the first round, won the second, and lost the final two.
In 2008 an updated version of Polaris came back and played six online limit hold’em poker pros in a similar format and won three matches, lost two and tied one. Across the entire 6,000 hand sample the second time around, Polaris won a net total of 195 big blinds. ♠
Features
The Inside Straight
Strategies & Analysis
Commentaries & Personalities