Sign Up For Card Player's Newsletter And Free Bi-Monthly Online Magazine

Online Poker Data Overload

by Daniel Kimberg |  Published: Aug 15, 2003

Print-icon
 

Playing poker online presents some interesting challenges for record keeping and statistical analysis. Because the games are played on computers, the physical record keeping itself is a lot easier. You can get a complete transcript of all the hands you've played – your "hand histories" – automatically from most online poker sites, freeing you up to concentrate on poker.

Well, almost automatically; on the sites I've played (admittedly, just a handful), there's no way to have the hand histories stored on your computer automatically as you play. Although this would be a trivial addition to the client software, and, if anything, would reduce the burden on a site's servers, for some reason it's not a standard feature. So, you have to remember to request hand histories when you want them. The good news is that you don't have to request each hand separately, you can request them in batches. The bad news is that the histories go back only so far. So, if you forget to request hand histories for a while, you could easily lose important data.

If you do have data from all of your online sessions, the next question is what to do about data analysis. This is a complicated subject that deserves detailed treatment. For today, I'd like to write about one issue, which is the tradeoff between specificity and power in statistical analysis.

With live poker (I'm using "live" here to mean "not online," even though online games are also live in a sense), most players keep track of things like the date, location, game, limits, time, and result. So, a typical poker journal entry might be "3/15/03 at The Mirage, two hours of $6-$12 hold'em, +$65." Once you have enough data, you can calculate statistics from different combinations of data. You can combine all games and limits to get a sense for how you do overall. You can select just the hold'em games to find out how good a hold'em player you are, or just the $5-$10 games to see how well you do at that limit. There are pitfalls to be avoided in all of these analyses, but in general the range of possibilities is fairly narrow.

With online poker, the density of data that accompanies the hand histories is much greater than what most people would collect on their own. You get detailed data from each hand, not just each hour or session. You can break down your hold'em hands by the number of players at the table and your position. You can chart how well you do when you're up versus down, or at the ends of long sessions. If you want to know how well you do when you're dealt medium pocket pairs in the big blind at a table with five or fewer players on a Thursday when you've already played at least three hours, that's no problem. If you're so inclined, you can even try to figure out which seat is your "lucky" seat (of course it's nonsense, but aren't you curious?). As long as you have enough data, there's no reason in principle that you can't extract the information from the data files. (There may be practical reasons if you don't have the software to help you, but that's an issue for another day.)

Unfortunately, "enough data" turns out to be more difficult the more specific you get. Suppose you've played 100,000 hands of online poker, a fairly large number. If you want to find out what your hourly earning (or loss) rate is, that's a pretty good batch of data. There may be some minor issues in interpreting the data if you tend to play multiple tables at the same time, but in general you should have enough information to produce a reasonable estimate.

But the more specific the situation you want to examine, the less data you have. Maybe only half your hands were limit hold'em ring games, only about 2 percent of those were medium pocket pairs, only a quarter of those were shorthanded, and a fifth of those were in the big blind. That leaves you with 50 hands, which may not be enough to support the kinds of statistical analyses you'd like. In statistical terms, your analyses will have poor power. Even if your true long-term earning rate on those hands is substantial, there might be only a small chance of detecting a statistically significant profit in a sample of 50 hands. If you're lucky enough to find an effect anyway, you can be happy. But if you're not, there's no way to tell if it's because you don't have enough data or because you really aren't playing those hands profitably. Correspondingly, your estimate of your earning rate is liable to be very poor. Even though your best guess might be a healthy profit, when you don't have enough data your confidence in that estimate will be poor.

So, while online poker data are certainly detailed, making use of all that detail requires some careful decisions. Making those decisions in a principled way requires a formal analysis of power that is a difficult process in itself. It depends in part on the variability of outcome for the situation you're interested in, so it's not as simple as a single one-size-fits-all number (although it's worth bearing in mind that the magic number for overall results in live play is probably at least 10,000 hands). But power estimates can often be calculated alongside the actual analyses, a process that most statistical software can do for you. Although I realize that most poker players don't examine their data using general-purpose statistical packages, it's good to know the option is available, and it's something to keep in mind if you want to try to get as much as you can out of the data you have.diamonds

Daniel Kimberg is the author of Serious Poker and maintains a web site for serious poker players at www.seriouspoker.com.