History of PokerLH and AIby James McManus | Published: Dec 26, 2008 |
|
Beginning in the 1990s, researchers, including Daphne Koller at Stanford and Darse Billings at the University of Alberta, began using limit hold'em as a model for computerized systems of incomplete information. At 17, Koller already had taught a database course at Hebrew University before earning her M.A. and enlisting in the Israeli Army; with no personal interest in poker, her goal as a Ph.D. student at Stanford was extending the frontiers of game theory and artificial intelligence. Billings was a poker pro who went back to grad school to study his vocation more formally.
One of the first programs Billings worked on, under the direction of professor Jonathan Schaeffer, was called Loki, after the Norse god of mischief and chaos. Loki won money in low-limit Internet games but didn't stand a chance against experts. Strategically, it followed David Sklansky's Theory of Poker. Able to consider billions of possible hands in a flash, the program made a probabilistic estimate of what hand it was up against, then played its own hand correctly according to Sklansky. Loki's advantages over most other programs at the time included an ability to make tactical adjustments based on an opponent's previous patterns. It bluffed with optimal frequency and learned from its mistakes. It even told pre-scripted jokes, quoted comedian Steven Wright, and responded to conversation on the Internet server, where it ranked in the top 5 percent of limit hold'em players in ring games.
Heads up against a live expert, however, Loki got handed its lunch. Even though its math skill was flawless, it couldn't make logical leaps or have insights. When a live expert raised preflop with 7-2, it was either a mistake or a move designed to pay off 25 hands later. Loki couldn't tell which it was; nor, without being told to, could it make moves like this.
Another problem faced by Billings and Schaeffer was that poker is relatively meaningless without money, but no person or institution had volunteered to put up Loki's bankroll to compete with stronger players – though it must have been fun to imagine the funding debate in Alberta's provincial legislature. Billings also conceded that Loki had to better account for opponents' unpredictability and generate some of its own; it needed to learn to think for itself. It never slow-played a strong hand, for example. Even average human players would be quick to pick up on this pattern and refuse to give Loki much action. "Computers are very dumb," he admitted.
As early as 1979, Doyle Brunson had predicted: "A computer could play fair-to-middling poker. But no computer could ever stand face-to-face with a table full of people it had never met before and make quality, high-profit decisions based on psychology." What human players have always had in spades is the capacity to learn strategic flexibility – to "playfully" randomize tactics. Pros call such tricks changing gears, suddenly playing much more passively or aggressively to keep their opponents off balance. The most talented players do this by feel, making shrewd leaps of faith about what move will work best in a particular situation as it comes up, often flying directly in the face of the odds.
But Billings and others insist that computers are catching up fast. What their machines already have, of course, are vast and perfect memories. IBM's Deep Blue can analyze in less than a second two hundred billion chess positions, and in 1997, it used this brute computational force to overwhelm world champion Garry Kasparov. That was chess, though, a game of complete, undisguised information; poker is much less straightforward. Yet if Billings and Schaeffer could somehow blend computational firepower with creative flexibility, they'd have an invincible program. "Somehow" and "if" are big caveats, though. "When it comes to imperfect information," Billings wondered plaintively, "how do you get around that? How do you deal with information that is possibly in error, or is deliberately deceptive?" So far at least, his software was unable to account for human insight and creative duplicity.
Koller and her colleagues in Stanford's robotics program de-emphasized opponent modeling in favor of classical game theory, the branch of mathematics invented by John von Neumann and Oskar Morgenstern in the early 1940s to maximize gains and minimize losses in contests of incomplete information. Opponent modeling was important in chess, she admitted, but "chess-playing computers don't do that, and they do very well despite that limitation." She called her program Gala, short for game language. Her goal is "to solve the general problem of finding game-theoretically optimal strategies in large games of imperfect information." She and her team developed an efficient search algorithm for determining the best possible play in each of the four basic hold'em situations: preflop, flop, turn, and river. Even bluffing, often assumed to be the most innately human and least programmable of tactics, emerges naturally from game theory in her algorithm. The architecture of her system looked like this:
In the July 1997 issue of Artificial Intelligence, Koller reported that Gala was exponentially faster than the standard algorithm, and that its imposing speediness would eventually allow for the solution of games, such as no-limit hold'em, that were orders of magnitude larger than previously possible. (An example of a "smaller" game is limit hold'em, with its preordained bet sizes, the only version of poker that Loki could solve.)
Gala was based on concise declarative language for representing hold'em by its rules. Sevens beat sixes. A flush beats a straight. Players act in a clockwise order. Each path of its game tree was then subdivided into products, each of which could be "re-expressed as the product of the realization weights of all the players' sequences on that path, times the probability of all the chance moves on the path."
Confusing to nonspecialists, maybe, but like every algorithm, Koller's was simply a set of rules for finding an optimal strategy in the smallest number of steps. And it was around this time that mathematician David Berlinski further humanized algorithmic logic by describing it as "a recipe," as well as "an ambidextrous artifact, residing at the heart of both human and artificial intelligence." Cutting through the intimidating formulae, Berlinski helped laymen see that algorithms "belong to the world of memory and meaning, desire and design," things that a chef, a hold'em player, or (in the future) WALL.E might employ as naturally as would a Silicon Valley wonk. It also made Koller's working assumption – that a deep mathematical understanding of a game's rules reveals the best tactics for beating it – shine through even more clearly.
Her work on Gala also assumed that the more we know about one game, the better we understand all of them, including social, financial, and military contests. "For me," said Koller, "this is more of an exercise in pushing the boundaries of game theory." In Gala's largest possible application, she said, one could extrapolate from poker to what she called "an automated game-theoretic analysis of complex real-world situations."
In practical terms, poker takes nerve, smarts, art, and good luck, yet a grounding in pointy-headed academic theory apparently doesn't hurt, either. Game theory, probability, and artificial intelligence were the subjects of Chris "Jesus" Ferguson's dissertation at UCLA. On May 3, 2000, a few days after being awarded his Ph.D., Ferguson won the $2,500 seven-card stud event at the World Series of Poker. Fifteen days later, he won the no-limit hold'em world championship. Pure serendipity? Probably not.
Seven years later, on July 23 and 24, poker pros Phil Laak and Ali Esmali played four heads-up matches against the University of Alberta's latest limit hold'em program, called Polaris, for which Billings and Schaeffer had been the chief architects. The cards dealt in one session to the humans would be dealt to the computer in the other, and vice versa. The result would be the sum of the two humans' scores versus the sum of Polaris'. The format, inspired by duplicate bridge, significantly increased the chances that skill would be the dominant factor.
The contest was held at the American Association for the Advancement of Artificial Intelligence conference in Vancouver. Four 500-hand sessions of $10-$20 limit hold'em would be played. If the humans won a total of 25 small bets or more from Polaris in a session, they would split $5,000. If neither the computer nor the humans managed to win 25 small bets, the session was considered a draw and the humans would split $2,500. (Smaller wins were not considered significant because of statistical variation.) If the humans swept all four sessions, they would split $50,000.
Laak, who wore sunglasses and often had his trademark "Unibomber" sweatshirt hood pulled up, said he studied every post-flop decision for about five minutes in an effort to make the right move. Even so, the first session ended in a tie: Eslami won $465, but Laak lost $395. The second session was won outright by Polaris, which beat the humans for a combined sum of $950 and left them visibly demoralized. "Polaris was beating me like a drum," said Eslami. "They just spent all their freaking time perfecting heads-up limit poker," said Laak, who had crushed a less advanced program, Vexbot, two years earlier, though he admitted he was dealt better cards. Speaking of Polaris, he said, "That thing just beat us."
According to reporters for Card Player, Polaris had been successfully designed "to bob and weave like a real player, adjusting to a player's styles and recognizing weakness. Polaris plays such perfect heads-up poker that its main strategy is the same as that of many pros: It tries to play basic solid poker and wait until its opponent makes a mistake."
After recalibrating their tactics, however, the humans mounted a comeback on day two. In the third session, Laak won $1,455 while Eslami lost $635, for a net profit of 82 small bets. In the final match, Eslami won $460 and Laak won $110, netting 57 more bets. The final score of the match was humans, two wins and a tie; Polaris, one win. Laak and Eslami split $12,500 for the wins and the draw, but they both admitted that Polaris had challenged them far more than typical human opponents. Both play limit and no-limit hold'em professionally, but even after narrowly defeating Polaris at limit, Laak acknowledged, "The bots are closing in."
Updating what von Neumann and Morgenstern first made clear in the '40s, Schaeffer, a national master in chess, said that "poker is harder than chess for computers, and the research results that come out of the work on poker will be much more generally applicable than what came out of the chess research." In other words, scrutinizing the math, logic, and psychology of poker will generate more of what we call "killer apps" – from nuclear deterrence to computers that think, see, and talk – than studying chess will. A decade after Deep Blue humbled Kasparov, one of the greatest players in chess history, the most advanced poker program still lost to above-average humans, even at the relatively simple variant of limit hold'em. If nothing else, it seemed to confirm that those who see no-limit hold'em as less complex than chess were mistaken.
Then it happened. Over the July Fourth weekend in 2008, an updated version of Polaris was brought to the Rio during the WSOP, and it defeated a team of online professionals. The victims were Victor Acosta, Nick Grudzien, Matt Hawrilenko, Kyle Hendon, Mitch McRoberts, IJay Palansky, and Bryce Paradis. All work as coaches on Grudzien's website, stoxpoker.com, which charges about $30 a month for access to its instructional videos. In six heads-up matches, with limits of $1,000-$2,000 and duplicate hands dealt to each team, Polaris 2.0 chalked up three wins, two losses, and a tie, netting $195,000. Members of Team Polaris were modest in victory, admitting they still had numerous kinks to work out. Critiquing his own play, Hawrilenko quipped at one point, "My navigation system could have played that hand better."
Because Darse Billings had completed his dissertation, professor Michael Bowling supervised the Alberta graduate students who reprogrammed Polaris, though Billings and Schaeffer remained integral parts of the team. "There are two really big changes," said Bowling. "First of all, our poker model is much expanded over last year – it's much harder for humans to exploit weaknesses. And secondly, we have added an element of learning, where Polaris identifies which common poker strategy a human is using and switches its own strategy to counter. This complicated the human players' ability to compare notes, since Polaris chose a different strategy to use against each of the humans it played."
How Polaris might fare against the likes of Todd Brunson, Howard Lederer, Jennifer Harman, Phil Ivey, or Andy Beal has not been determined. It is widely assumed that it would have almost no chance against top-level pros if the format were changed to no-limit.
Ian Ayres, the author of Super Crunchers, sees a much darker side to all this. A year before Polaris 2.0 beat the humans, he was calling computerized poker "one Darwinian struggle where the unaided human mind is definitely not the fittest." While claiming to be agnostic about whether online poker should be legal, he made a dire prediction in November 2007: "In the very near future, online poker may become a suckers' game that humans won't have a chance to win. Bots are quite scalable and it will be virtually impossible to prohibit computer or computer-assisted online playing." What about the online sites' commitment to identifying and banning the bots? "Unlike the statistical trail left by crude poker cheats at Absolute Poker," says Ayres, "it is possible for bots to randomize their strategies and even hire individual humans to run them," making it harder for sites to detect them.
One reason the best bots are so tough to beat is that without the possibility of visual tells, they are better at predicting an online rival's hand from his or her previous action. They never get tired or intoxicated, never need a bathroom break, never go on tilt. They are also much better, says Ayres, "at confounding the expectations of their human opponents. Computers can play randomized strategies much better than we can. Our brains are so hardwired to see patterns, it's devilishly hard for most of us to generate random behavior." Our biggest tells, he says, "aren't facial tics, but that we just can't stop ourselves from playing non-randomly. With training, we can get better, but we shouldn't fool ourselves. The handwriting is on the wall. High-quality bots are an online gambler's worst nightmare."
The good news, according to Ayres, is that bots won't kill poker, they'll just drive it offline; live, "humans-only" action will continue to thrive. Even if he's being unduly pessimistic about Internet poker, it's clear that for the game to keep thriving online, not only must it be exempted from the UIGEA (or the entire act must be repealed), but the sites must figure out better ways to make virtual action more botproof.