Shuffling and Randomization

by Daniel Kimberg | Published: Jan 03, 2003

I'd like to devote this column to some topics that come up in the context of computers and poker that I think have been the source of some confusion. The integrity of poker, of course, depends critically on the randomness of the cards. This is true whether it's played in a brick-and-mortar cardroom, online, or against a computer simulator. We call the randomization procedure "shuffling" because that's a reasonable description of how people do it. Of course, when a computer is doing the shuffling, the issues are somewhat different. Cards aren't gathered together in patterns in the center of the table. When two players both turn up pocket queens, they don't slide their hands together to make quads (bunching all four queens) while the dealer splits the pot. At the same time, a human is unlikely to shuffle so poorly that the entire deal would be predictable, while a computer program with a simple bug could easily produce a predictable deal. Understanding how computers produce randomly ordered cards can be helpful in evaluating the many contradictory things you're liable to hear about poker and computers. So, in no particular order, here are four things you should want to know.

It's not about shuffling, it's about randomization. Shuffling is only a tool by which we try to randomize cards. Names aside, an appropriate algorithm for computerized poker should aspire to random ordering of the cards, not to an ordering that mimics what human shufflers would produce. Some online poker sites make a big deal about their shuffling algorithm (typically one proposed by Donald Knuth, whose influential volumes on computer programming are the standard reference for many widely used algorithms). And well they should, because it's very important. But what they should really want to sell us on is not how well they simulate human shuffling (which is often nothing to write home about), but how (un)predictable the cards are after a deal.

Computers should do a better job of randomizing cards than people. This isn't immediately obvious, especially to those of us who grew up viewing computers as the ultimate deterministic machines. But computers are a lot better at behaving randomly now than they used to be, and can afford to be a lot more thorough than people in applying that randomness to cards. In practice, there are lots of ways in which this process can go wrong. Good algorithms can minimize the impact of some of these potential flaws. Used well, shuffling algorithms should effectively randomize the cards. But it's worth bearing in mind that shuffling is only a means to an end, and that end is randomness. In fact, if we could generate completely random numbers, randomizing our cards would be as simple as assigning a different random number to each card and then just sorting them by number - which brings us to our next issue.

Yes, there is randomness in the world. If you want to start a very painful argument, get a computer scientist, a philosopher, and a physicist into a room and ask for opinions on randomness. Catch them on a different day and they'll all argue different sides. I don't want to argue with any of them, so I plead that I'm only using the word "random" for pragmatic reasons. And I'm going to use the pragmatist's definition of random, which for purposes of computerized poker (which depends on random numbers in many ways, not just for shuffling), is fairly broad. Numbers are random if no one could take advantage of information at their disposal to do any better than chance at predicting them.

The apparent contradiction in trying to get computers to generate random numbers has occasionally made the whole exercise seem futile. And, indeed, you would be hard-pressed to generate a useful random-number generator on an older home computer without a network connection or any special hardware. Fortunately, thanks to modern technology and a mix of mathematics and computer science, today we can do a reasonably good job of generating random numbers that meet our goals. It's done, typically, with a two-stage procedure.

The first is gathering entropy, or sources of data that may be partly predictable but also have some less predictable components. It doesn't matter if these sources are almost completely predictable, as long as each one contributes a useful amount of entropy. Entropy is measured in bits, binary digits that can be either 0 or 1. Bits are the basic unit of information, and the goal of gathering entropy is to find as many random bits as possible.

Finding entropy is more than half the battle, and although it's a little tricky, it's not impossible. A good example of a useful source might be the outside temperature in Las Vegas. The hundreds digit is almost always a 0 or 1, which might seem like it ought to be good for one bit of entropy. But it doesn't tend to change either rapidly or unpredictably enough to be useful for random-number generation. Anyone who knew the last temperature measurement would do pretty well at guessing the next one. The tens and units digits also change too predictably and slowly to be useful. But if we had a very accurate thermometer, and measured the temperature in thousandths of a degree, we'd have a number that almost no one could predict better than guessing, even if they lived right next door, or if they had access to the previous measurement. That single digit would give us about 3.3 bits of entropy. While a superaccurate thermometer seems a bit impractical, most new computers have temperature sensors that work more or less like this, measuring the unpredictably fluctuating temperature on the motherboard.

Of course, that's just one example. There are plenty of other useful sources of entropy in the world, and an internet-connected computer has access to many of them. Other sources include the precise timings of network communications and keyboard and mouse events, the contents of frequently changing web pages, and in a pinch, special hardware that can generate truly random numbers by sampling radioactive substances (well shielded, of course) or noise in electronic components. In fact, there are even web sites you can visit that will provide random numbers derived from radioactive sources. Hit reload and get a different random number.

The second nifty tool that helps us build usefully random numbers is the hashing algorithm, which is designed to take an arbitrary amount of data and distill it down to a fixed size by a series of convoluted mathematical operations. Hashing algorithms are useful for lots of things, but in this context they provide a way of combining different sources of entropy into a single random number, which can then be used to randomize cards. Well-designed hashing algorithms benefit from all the randomness in their input, but don't suffer from any of the predictability. Because of this property, we can throw into the entropy pool lots of predictable data, or even data intended to compromise the random-number generation, and not suffer for it, as long as we also throw in enough very unpredictable data. If you combine 50 bits of truly random data with a million bits of completely predictable data, you still have 50 bits of entropy. Even better, the entropy tends to be evenly spread over the hash value. So, if you're creating a 512-bit hash, as long as the total entropy in your input exceeds 512 bits, your random number should be completely unpredictable.

Predictable doesn't mean knowing anything with absolute certainty. It's easy to get caught up in the language, but what's important in randomization of cards is that the probability of each undealt card being the next one dealt be equal. It's easy to pronounce a randomization scheme adequate because no one can predict, with a magician's flair, the next card off the deck. But in poker, we should be concerned by much subtler biases. If you know that a certain card should be dealt on the turn one time in 47, but it's actually 1.1 times in 47, you won't impress anyone with your clairvoyance, but you should certainly be concerned about the error. It doesn't sound like much, but we can imagine some very consequential biases. Imagine a tight game in which the more skilled players tend to be playing made hands and the less skilled players tend to be chasing draws. Suppose the nominal rate at which those draws come in is 35 percent, but due to some bias (perhaps suited cards tending to be closer in the deck than you'd expect by chance), they actually come in at 36 percent. Although a difference like that wouldn't affect anyone's strategy, it would have a substantial effect on the flow of money in the game.

Obscurity isn't security. Computer security experts refer derisively to the belief that secret algorithms are more secure than open ones as "security by obscurity." The idea is that if you don't tell people all the details of your random-number generation scheme, that should make the numbers even harder to guess. Although security by obscurity has its place (despite the knee-jerk negative reaction many security experts have developed), it's generally best considered a red flag. A well-designed and well-implemented random-number generation scheme can't be exploited. A bad one is ultimately no more secure for being secret. This principle has long been recognized as an important characteristic of useful data encryption techniques, and is well demonstrated by the successes of open source software. While there are ways in which obscurity could be helpful (for example, an otherwise reasonable random-number generator depends on input that anyone could access but no one would ever guess), it should never come up in the context of poker. When you hear someone claim they have a top-secret shuffling algorithm, random-number generation scheme, encryption method, or security system, it's an excellent bet that their system has some serious weaknesses.

There's a lot more to know about randomization, computers, and poker, but these points are important and essential. While it's easiest to see the relevance in the context of online poker, where there can be money at stake, it's at least equally relevant to computerized poker and poker simulation. Outside the gambling community, card shuffling may be of less interest, but random-number generation is of critical importance to computer security. So, it should still be of interest, even if you don't tend to mix cards with computers.