Sign Up For Card Player's Newsletter And Free Bi-Monthly Online Magazine

Team Polk’s Bryan Pellegrino Talks About His AI Research And How It Helped Formulate Strategies To Win $1.2 Million

Former Poker Pro Explains How Doug Polk Defeated Daniel Negreanu

Print-icon
 

Throughout the early days of the poker boom, Bryan Pellegrino was arguably one of the best heads-up sit-n-go players on the planet, terrorizing opponents under the name ‘PrimordialAA.’ Like many other poker pros of his generation, he dropped out of college to pursue poker full-time and made quite a good living beating some of the highest stakes available online.

He also made a small splash in the live tournament world, making three deep runs in the World Series of Poker $10,000 no-limit hold’em main event, a runner-up finish in the 2012 WSOP $1,500 pot-limit hold’em event, and two more deep runs in the WSOP $10,000 no-limit hold’em heads-up event.

Around 2015, however, Pellegrino decided to move on from poker. After a year-long vacation traveling the world with his wife and son, Pellegrino dove into the computer world. He created a machine learning model that focused on pitch sequencing that he sold to a Major League Baseball franchise and then founded a cryptocurrency business in Silicon Valley.

In 2020, however, the computer world brought him back to poker. Last July, the New Hampshire native helped publish a research paper with Noam Brown from the Facebook artificial intelligence research department. The paper was about how artificial intelligence could use game theory to perfect poker strategy and use those same concepts to solve problems in the real world.

When Negreanu accepted the challenge, Polk immediately began putting a team together to perfect his overall heads-up game. He hired a couple of heads-up coaches to help him implement strategy in the best way possible, a group of people to log hands to create a database of information on Negreanu’s tendencies, and another team to help cement what Polk called his ‘preflop strategy.’

Pellegrino was brought on to help with the preflop work. He sat down with _Card Player_ to discuss what he was doing behind the scenes with Polk, how his AI was an improvement over other solvers available to the public, and how this technology can solve real world problems.

Steve Schult: Doug ends up reaching out to you recently to become a part of his team. Did you guys have a relationship while you were playing professionally? How did he find you?

Bryan Pellegrino: We both played heads-up. He played heads-up cash and I played heads-up sit-n-go’s. I ended up getting coaching from [Daniel Cates] and started working on heads-up cash, but I never really dove deeply into that scene. But that said, through the AI stuff, we ended up doing the research, and through Facebook AI research we ended up publishing an academic paper. The work that had been done around counter-factual regret minimization, specifically the areas that it could be used outside of poker, were one of the areas that we found interesting. But in order to kind of prove it, we wanted academic benchmarks early on.

Doug reached out asking to see if I was still active in the game and the community. I think he was looking to get a varied opinion of the best studying resources and the best way to prepare for a match. He is unbelievably diligent, more than anybody else that I’ve ever known. I’ve played poker for 15 years and I don’t think I’ve seen anyone put in the work like Doug, in terms of the studying, the repetition and getting all the right materials together.

And Doug is very familiar with Noam Brown, one of the people who worked on the paper. Doug and his team were the guys that battled Claudico and Libratus (advanced AI poker bots), so he knew about Noam and his work. I told him that I had just published this paper with Noam and that the results were pretty phenomenal. He was interested in how we could leverage the research into study material.

SS: What exactly is counterfactual regret minimization? How does it relate to poker?

BP: The very simple way to explain it is that in the past, a lot of people would model decisions by maximizing your payoff. You want to try and win the most, right? But what people found was that what you actually want to try and do is minimize your regret.

That is going to lead you to Nash equilibrium. That is going to lead you to GTO [game theory optimal] strategy. Let’s say we are playing rock, paper, scissors and I was using counterfactual regret minimization. If I threw a rock and you threw a scissors, I would have a regret of -1, meaning I wouldn’t have any regret. I’d feel great. If you threw a rock, I’d be neutral. And if you threw a paper, I’d have 1. I’d have regret.

So what I would do is use those regrets on the three outcomes to change my strategy. So now instead of throwing rock 100% of the time, I’m going to throw it less, according to my overall regret. And if you do that trillions of times, you will get a GTO rock, paper, scissors strategy.

The same thing works for poker. Except rather than a simple three options, you have a giant tree with every bet size people can use and every action they can take on them. And the goal is to take that tree and minimize the regret. If you do that, you’ll come up with GTO strategy. A strategy that will never regret anything. There is nothing your opponent can do to exploit you that is going to make you regret too heavily.

SS: Can you break down what the research paper was about in layman’s terms?

BP: We published a paper called Unlocking the Potential of Deep Counterfactual Value Networks. The University of Alberta and Carnegie Mellon University had all done this research on essentially poker AI. They were using these techniques and basically we came up with a bunch of variants of these techniques. We created a novel DCFR+ variant, something like 5,000x overall speed performance over prior top agents such as DeepStack, and we played the winner of the last ACPC [Annual Computer Poker Competition] which was Slumbot.

All the academics get together and they run a challenge. They have their newest research for poker and they all play them against each other. So, we took the winner of that and played it. And we beat it for 20 big blinds per 100 hands. We completely crushed it.

I’m a college dropout, so the fact that I’m publishing academic papers with the Facebook AI research team means that we did something pretty impressive here. The academic community has been awesome, and I think was really impressed with the results of our paper. And our paper had just been published right around the time that Doug was thinking about his challenge with Daniel.

SS: What did he say to you that made you want to be a part of his team?

BP: I don’t want to be too nitpicky against the academic community, but it’s really hard to benchmark against other famous AI’s. We reached out to every other major AI and none of them were interested in benchmarking against us, especially since some of these agents cost upwards of millions per day to run. Slumbot happened to be public and very well respected.

But after we published it, we had nothing else to do. We are not going to continue down this road of research, and so we dove into many other fields… sort of the application of the technology. But when Doug reached out, it was this interesting opportunity to kind of see how someone who studies with this does out in the wild. Here’s a chance to have this integrated into a high-profile challenge. We had reached out to [Phil] Galfond in the past to see if he was interested in anything, but ultimately it was just a way to help Doug and potentially bring some attention to the research itself.

SS: You mentioned that this type of work can be used in other areas of outside of poker. Can you elaborate where and how?

BP: This challenge was awesome and publishing with Noam Brown from Facebook AI research was a huge honor. Some of the things we explored were autonomous vehicles. We were working on routing problems within self-driving cars, and we have also looked at robotics in greenhouses. There are greenhouse technologies that can help create tens of billions of dollars worth of produce and how AI technologies can impact this and make a difference. We are exploring drug discovery now. We are fascinated by the process and excited about what can be done there.

SS: How does counterfactual regret minimization apply to something like a self-driving car?

BP: If you’re trying to route through this huge network and there is traffic and all these other things that are going on, you can essentially model that problem of how to get to your destination with the least regret. Let’s say time is the regret and you want to minimize the amount of time it takes to get there. But it doesn’t have to be time. It can be time, it can be road conditions, or it could be tolls. You can find all these awesome real world applications.  

SS: Doug said that you were one of the guys that helped construct his preflop ranges. How did you do this?

BP: The paper is essentially a solver. We created a solver that just happens to be extremely good and fast. The modern way that most of these solvers work is that when they do preflop ranges, they have to heavily abstract what they are doing.

So, you can build a modestly-sized preflop tree. Not that many options and not that big or complex of a tree, but then you would be going to a huge number of flops and a huge number of turns. So these trees get very large… hundreds of terabytes big. More than you could fit on any computer. So, what they do is that they abstract them down. They only look at 10 flops or 56 flops, whatever the subset might be. And that comes with its own set of accuracy. You have to pick flops that you hope are representative of everything and give you a good picture.

With us, we don’t do that at all. We are using a neural net to query these things. So we can build as big and as complex a tree that is humanly possible. Things that would take 500 terabytes that no modern computer could solve, we could do in 30 seconds. This would allow Doug to say, “Hey, we want to figure out what the best sizing is at every stack size. So let’s run a 2x, a 2.1x, 2.2x, 2.3x” and so on, and he can do that at every stack size. It can get very granular.

Where is it practical to implement changing your size? What if Daniel… and you have to remember, this is before they had played any hands originally. What if Daniel opens to this size? What if he limps? Is he going to three-bet to this size? What is our optimal three-bet size? It was just a huge number of runs.

Doug would take these outputs and he would aggregate them and go through them with his coaches. It’s a balance between what is practical to implement in the real world, because you can’t have 57 different sizes and be able to remember all of it. So, you can kind of pick one or two sizes and figure out how complex of a strategy you want to implement and whether or not it’s worth it based on the EV (expected value).

Early on, it was a lot of that. Just a huge number of runs trying to figure out what were optimal sizings and how to toy with things, figuring out what ‘DNegs’ might do. But if you’re talking about one of these other solvers available on the market, it would take a week to do each of these runs and get these results, and that is on a small subset of flops.

We could run 150 of them overnight and just have a huge report for him in the morning. And that’s really what he did. He’d come back with another iteration and say “Hey, this was interesting. Let’s explore this more.” He was in the lab, man. He was definitely in the lab.

SS: What was the schedule like? Did he just come to you after every match with questions and meet with you on the off-days in between?

BP: That was more his coaches. I think he’s going through strategy and how well he did implementing those strategies with those coaches. And for us, it was like “Hey, we want to explore this.” We would ask what kind of trees he wanted us to run and figure out what he was trying to get out of this. And then we would go back and run all of these things and just give him a huge report to try and go through.

He wasn’t coming back and talking about specific implementation details in his game. That was mainly with his coaching team. For us, it was more about why something was happening. There were times where he had constructed a tree wrong or he thought something was kind of funky. For us, it was really about getting him as much data as humanly possible.

SS: Negreanu was very open about making changes to his game as the match progressed. Did you have to run data specific to those changes? What was it like seeing Negreanu’s game evolve from your perspective?

BP: We definitely noticed some of his tendencies. He was doing some things that were just things you should never do. He was flatting pocket kings and pocket queens from out of position, for example. There were all these plays that couldn’t even be considered a mixed strategy. They were just things that should be at stone zero.

We had to figure out what world was his strategy being pulled from. Where was he getting these things. I was kind of questioning reality for a little bit. I know this shouldn’t be a thing, but it was a thing, especially because he had an early heater. There were some things that had us asking questions, but we just had to go back through it.

He started mixing in other sizings, and after they started playing, you saw where he would change his sizing and when he didn’t change his size at all. Or we thought he’d be using a certain three-bet size, but he was actually using another. It was a continuous process and there were a lot of ranges being dumped every day throughout the entire challenge. Doug was just an animal. He wanted to learn more, he wanted to dive in more.

SS: Hearing you talk about this stuff is remarkably interesting, but do you think the average poker player is scared away from playing heads-up poker after hearing about how in-depth some of this stuff goes?

BP: It’s daunting in a sense, but nobody should be disillusioned at what it takes to become the best in the world. You look at an NBA player and you probably want to believe that they are so naturally talented that all they do is step on the court and crush, but in reality they have huge teams of help like dieticians, and free-throw shooting coaches, specific coaches for everything they do.

Everyone who is elite in something as competitive as poker knows that it takes more and more work. When I started in 2002, it was just kind of smart guys that were trying to outwit one another. There weren’t even solvers. You were just talking theory with your friends. I’m sure that’s what basketball was like back in the ‘70s, but things evolve as they get more competitive.

Ultimately, that’s just what it takes to become one of the very best in the world. Because the measure of the best in the world now is so much better than what it was 10 years ago. The same way Steph Curry and Lebron James are better at basketball than anyone was generations ago.

Most people are just going to watch poker and they’re just going to see these people’s minds working the same way you watch an athlete on TV. You don’t see the crazy amount of work that goes in to getting those skills and being able to compete at those levels. ♠