Sign Up For Card Player's Newsletter And Free Bi-Monthly Online Magazine

Poker Bot That Dominated Humans In Heads-Up Could Soon Win At Six-Max, Computer Scientist Says

Potential Is There For More Major Leaps In Abilities Of Poker AI

Print-icon
 

Late last month, a computer program developed by Carnegie Mellon University was able to beat a team of world-class poker pros in heads-up no-limit hold’em. To the surprise of perhaps everyone involved, the artificial intelligence crushed the poker pros by more than 14 big blinds per 100 hands.

They played a total of 120,000 hands and at the end the computer was up 1.7 million chips, or about 17,000 big blinds. That was nearly 90 buy-ins. Fortunately for the poker pros, it wasn’t real money, though they were paid for playing. Everyone on the team finished down chips to the bot.

The machine, dubbed Libratus, is “the holy grail of poker AI,” said CMU PhD. student Noam Brown. Brown and CMU professor Tuomas Sandholm developed the machine, which was just the latest in a series of poker bots from CMU. Never before had a machine beat its world-class human opponents in the game of heads-up no-limit hold’em.

According to Brown, there could be a lot of room for improvement in subsequent versions of Libratus. An updated machine could theoretically crush Libratus by as many as 50 big blinds per 100.

Card Player spoke to Brown about the historic match and what’s next in poker AI research.

Brian Pempus: Were you in any way surprised by the results of the match?

Noam Brown: Yeah, I actually was surprised by how well the AI did. Going into the competition, we tested the AI against previous bots and we had a sense that it was beating […] Claudico [the previous bot] by 10-12 big blinds per 100, which is more than the humans beat it by [in 2015], but not by a huge margin. So, going into the competition, we thought we had a small edge over the humans. We weren’t sure how big it was. We were very impressed that the AI managed to do so well.

BP: So you didn’t think the AI was quite ready to beat the humans by 14 big blinds per 100 hands?

NB: Yeah, we didn’t appreciate how much of the humans’ edge over Claudico was due to exploitation. They found weaknesses in Claudico and were able to take advantage of that. For example, raising Claudico’s limps was pretty effective and was a big chunk of their margin of victory. Libratus wasn’t playing to exploit its opponent. The fact that Libratus was able to beat Claudico by 10 or 12 big blinds per 100 without exploiting it, suggested that Libratus was much stronger than the humans in head-to-head play, if it didn’t have any weaknesses. The reason for Libratus’ victory was that it didn’t have any vulnerabilities that the humans were able to take advantage of.

BP: At the point in the match where the humans had brought it back close to even, did you think that they had maybe found a weakness in the game of the computer or were you still confident?

BrownNB: Yeah, towards the end of the first week it was almost back to even. During the first week there was a lot of speculation among the human players as to how Libratus was adjusting and where it was strong. They didn’t tell me everything that they thought was going on, but from what I heard they were seeing patterns in the data and weaknesses that were there and also weaknesses that weren’t there. So, for the most part I wasn’t too concerned. They were thinking the AI was flawed in ways that it wasn’t. For example, one day they tried three-betting the bot with 80 percent of their hands because based on the data they thought the AI was weak against a particular three-bet size. I don’t think that was an actual vulnerability. It was just noise in their data, in the hands that they had played so far, that led them to think that. But they were seeing patterns that were there. For example, they noticed that it wasn’t responding very well to particular opening sizes. Those were weaknesses in the AI going into the competition that we didn’t think would be a big deal, but it turned out to be a fairly substantial hole. Fortunately, the AI was prepared for this, and overnight while the humans were sleeping, it was constantly training to fill these gaps and prevent this exploitation from being a problem long-term. That’s why you saw things turn around after the first week.

BP: Was it pretty crucial to be able to fine-tune the bot after a session? Did this level the playing field because the humans were able to talk strategy among themselves?

NB: There was a lot misconception; there wasn’t this fine-tuning of the AI. It wasn’t like we were telling it to four-bet more often or fold more. What was happening was that the humans were using different bet sizes preflop and on the flop. We had a bunch of bet sizes already programmed in so it knew how to respond to an opening size of 2x, of 2.5x, of 3×. But if the humans started opening to 2.75x for example, the AI would round that to 3×. So its response would be pretty good. It’s not an unreasonable thing to round round 2.75x to 3x, but it would be better if it could respond to that exactly without having to round to a nearby size. Overnight, it would train how to respond to 2.75x, and the sizes that it was training for were determined by an algorithm that prioritized the different bet sizes based on which ones the humans were using most frequently and how far away they were from an existing size that we had in the game tree. So, that was pretty much the only fine-tuning going on. It would learn how to respond to these different preflop and flop bet sizes better. This was a key part of the algorithm that allowed it to adapt over time to the humans’ playing style. It wasn’t exploiting the humans like they speculated. They were all playing the same AI throughout the competition. But it was simply learning how to respond to these off-tree bet sizes better over time.

BP: So is it fair to say that how the bot responded to action on the turn and the river was less important in terms of adapting during the match than it was for preflop and the flop?

NB: For the turn and the river, you noticed that the AI would take some time to think when it reached the turn. It would actually take some time to think on subsequent actions on the turn and the river. Some people didn’t notice because it went so quickly, but it was actually recomputing its strategy every time the humans made a bet on the turn and the river. The reason for this was that it was able to compute a strategy that perfectly responds to whatever bet size the humans used on the turn and the river. So, this issue of having to pre-compute a bunch of different bet sizes to put in the game tree was just not an issue for the turn and the river because it was computing those strategies in real time.

BP: Is that real-time ability something that Claudico didn’t have? Or was it just not perfected?

NB: Claudico had a real-time solver for the river, but it was weaker in several ways. First of all, it didn’t take into account blockers. In order for it to run quickly it had to group a bunch of hands together and treat them identically. So, for that reason it would treat having the ASpade Suit with three spades on the board the same as having the AClub Suit with three spades on the board, even though they should be treated differently. Claudico’s end-game solver would calculate in real time a bunch of different bet sizes but it wouldn’t recompute every time the humans bet. I think this re-computation every time the humans made a bet was a key reason why our AI did so well this time. Also, this time we were able to scale it up starting on the turn, which is a much more intense computation because now you have to deal with almost 50 different rivers that could come out, and the number of actions that could occur before the end of the game [hand] also grows exponentially. So it’s a computation that’s about 1,000 times more expensive [than Claudico]. Were able to scale this new algorithm very effectively.

BP: How far away is Libratus from playing a perfect GTO strategy? How many more versions of this machine could you keep putting out?

NB: Nobody knows how far the AI is from a game theory optimal strategy. We have methods for calculating that but they are extremely expensive to do. They haven’t been done yet. It’s something that we might look into in the next year or so. If I had to speculate, my guess is that a perfect GTO bot might beat Libratus by 15 big blinds per 100. That’s my rough estimate. It could be anywhere between 5 and 50 big blinds per 100.

BP: Wow, so there is potentially a lot of room for improvement for this AI?

NB: It’s hard to say. One of the key weaknesses of past AIs is that it they didn’t take blockers into account on the turn and river. This is really important in high level play. Libratus doesn’t have this problem. It considers every hand uniquely on the turn and the river. This was a key advance that led to a huge jump in performance compared to previous bots. Now, there’s no more room for improvement on that front, on being able to distinguish blockers better. But there is perhaps room for improvement in how it chooses its bet sizes. It’s hard for me to speculate how much of an improvement you can get there, but my guess is probably about 15.

BP: There was talk of how aggressive Libratus was with its over-betting on the turn and the river. Was that something you felt like the AI had perfected or could there be room for improvement in how it balances its ranges in these spots?

NB: The over-bets were one of the things that really surprised us during the competition. Libratus was not trained with human data, it had never seen a human hand of poker. So it came to the competition with a unique strategy that it thought was optimal and was very different from what humans consider to be optimal play. The big over-bets were a big part of this strategy, as well as the donk bets. It was incredibly impressive and incredibly satisfying to see the AI do something that humans haven’t been able to pull off before. I think we saw this kind of aggression a bit with Claudico, which was infamous for making huge all-in shoves when the pot was very small. But I think it went about it the wrong way. It was imbalanced in how it was making those huge bets. I think with Libratus we saw balanced aggression, and that was key to its victory.

BP: A lot of people seem to be concerned about what this means for the future of online poker. Could you talk about how the AI you developed doesn’t jeopardize the integrity of the game, at least at this point in time?

NB: I can at least assure people that we’re not running Libratus online and there’s no plan to do so, ever. But, obviously that’s not going to stop other people from taking the technology that we publish and incorporating it into bots that might appear online. I’m not going to speculate too much about how bots might affect online poker. I don’t really know that world too well. I know that there are bots online being used, and poker sites put a lot of effort into catching those bots. I don’t know which side is winning that war.

BP: If the stack sizes in the match were lowered or raised, what effect would that have had on the results of the match? Could the machine handle a stack of 500-1,000 big blinds?

NB: 200 big blinds was chosen because that’s the format used in the annual computer poker competition. There’s an annual competition where AI researchers who do work on poker meet and play against each other. 200 big blinds was seen a posing a particularly difficult challenge to AI because as the stacks grow deeper the AIs have more trouble dealing with the increasing number of options available. From my understanding, 200 big blinds is on the upper end of what humans play with. I think that it was an appropriate size to use to keep things fair but also to make it as challenging as possible for the AI. If the stacks were lower, for example 100 big blinds, the AI would have done just as well if not better. The question how would it have done if the stacks were 500 or 1,000 big blinds deep…honestly I think it would have done just as well if not better also. Not because it’s easier for the AI but because it’s harder for the humans at that point. I don’t think the humans are used to playing at 500 or 1,000 big blinds deep. At that those stacks, these huge over-bets that Libratus really excels at would be much more important. I don’t know if the humans could pull off [over-betting] as well as Libratus would.

BP: Would another area for no-limit hold’em poker AI research be in having a bot that could handle additional players at the table?

NB: There has been some research on three-player poker. Generally speaking, the techniques that went into Libratus work really well even if you have more than two players. The problem is not with the techniques but with how you evaluate performance. Because when you have more than two players you can be playing a perfect GTO strategy and still lose money because the other players are colluding either implicitly or explicitly. So, it’s really hard to have a game where there’s one AI and five humans and try to establish if that AI is better than the humans. It’s not really feasible to measure that. That’s why this competition was a two-player competition and that’s why the results are most meaningful in the two-player format. I think that for now six-max is a little bit beyond the abilities of Libratus and similar AIs. That said, the annual computer poker competition is adding a six-player league going forward, so research on six-max poker is going to start to happen and I think that the field is going to develop very quickly. I think that with some minor improvements to Libratus, you’d be able to see it beating humans at six-max within two years. When you are playing with six players it’s not really clear if you want to play GTO, it could be better to focus on exploiting weak players. This is a discussion we are having in the community and there’s no answer yet. Humans still have an edge in exploiting weak players and taking advantage of them.

 
 
Tags: Poker Bot,   CMU,   Libratus,   AI,   Noam Brown