Passwords Security

A poster on 2p2 recently exposed a major security vulnerability on Lock Poker. The poster found that his password was included in plaintext in the source code of Lock’s Casino app.

I’m not particularly interested in discussing the specifics of Lock’s implementation, but based on my reading of the thread and some PMs/IMs that I got, a lot of people in the poker community could use a basic run-down of how basic password security works. Indeed, it seems that many players (and perhaps some poker site employees) don’t understand what the heart of the issue is here: A password should not exist in plaintext for longer than it needs to, and it doesn’t need to for very long.

The fact that people seem to not know this is slightly worrisome. So, I thought I’d outline the basics of a standard password implementation in a quick post. This certainly won’t be perfect (No implementation is, of course). But, it’s roughly what your bank uses, and your poker site should probably use many of the same ideas, if not the exact same implementation.

Cryptography is extremely counter-intuitive for the uninitiated, so I’m going to dumb stuff down a lot:

[toc]

Step 1: SSL

When your computer is sending secure information (e.g., passwords, credit card numbers, etc.) to anyone, there are two important things that you need to do right away:

  1. Make sure that you’re actually talking to the right person/computer.
  2. Figure out a way to talk to the person/computer secretly (i.e. without anybody eavesdropping).

SSL (which is now apparently called TLS, though I’ve never actually heard someone say that) handles both these problems together. It works roughly like this:

  1. Alice (e.g., a poker player) wants to send some private information to Bob (e.g., a poker site).
  2. Alice obtains instructions from Bob, which describe a method that allows Alice to encrypt messages in such a way that only Bob can read them. Alice must somehow be confident that these instructions originated from Bob.

    The actual way that this type of encryption works in practice is a bit too complicated for this blog, but you might imagine Bob sending Alice a padlock that only he has the key to. Alice can seal a message using Bob’s padlock and send it through the mail. Even though the message will change hands many times before it reaches Bob, only Bob will be able to open the lock and read the message.
    In reality, all Alice needs is a number, which is called Bob’s public key. For poker sites, this is often coded directly into the software, so a player can trust that the key is correct as long as she trusts that the software was downloaded from the right place. For other purposes, like when you visit your bank’s website, the public key is often verified by a trusted third party.
  3. Alice then comes up with a secret code. You can think of this like the type of code you might have had when you were a little kid–e.g., Alice might tell Bob “When I say ‘banana’, I mean the letter c.” In reality, of course, the code is obviously much fancier–it’s a secret key to be used for symmetric-key cryptography.
  4. Alice uses Bob’s instructions (his padlock or his public key) to send her secret code securely to Bob.
  5. If Alice is talking to the right person, Bob now knows Alice’s secret code, and nobody else does. Bob and Alice can then use the secret code to communicate securely.
    If Alice is talking to the wrong person, then that person will not be able to figure out what was in Alice’s message because only Bob can do that. So, that person won’t have the secret code, and there’s no risk of any information being compromised.

All of this happens before you send any message containing real content, and if it’s done properly, you can be pretty confident that you’re talking to the right person with nobody else able to listen in.

When you’re browsing websites, most browsers provide a simple way to verify that this is happening. It’s certainly not perfect, but it’s good enough for most. You simply look to see if the web address begins with “https://” as opposed to “http://”, and look for a padlock symbol that your browser should display fairly blatantly. Here’s how Google Chrome shows pages that use SSL and pages that don’t:

You should get in the habit of looking for things like that when you’re entering sensitive information. If you’re putting your credit card number into a website that doesn’t even bother to use SSL, that should worry you. It’s possible that they use some other form of encryption, but most likely the site is simply insecure.

When you’re using the internet outside of a browser (e.g., connecting to a poker client), I don’t know of a simple way to check that SSL is being used. And, even when you do see the iconic padlock in your browser, there’s no guarantee that the site has set up SSL properly. In both these cases, you might want to ask your local nerd to check for you. (Remember that both Cereus and Cake did not use SSL before the community found out about it. It wasn’t until the curious nerds at PTR uncovered this that anybody knew about it.)

Step 2: Use a Hash Function

So, Alice has established that she’s indeed talking to Bob, and Alice has set up a secure method of communication. The naive next step would be for Alice to set up an account by sending a username and password to Bob. Then, Bob can store those things on his computer, and in the future, Bob can verify Alice’s identity by asking for her username and password.

However, this naive method breaks the first rule of passwords: A password should not exist in plaintext for longer than it needs to. There are good reasons for this: Plaintext passwords can be used to easily log in to the site in question (i.e. by typing it in); people reuse passwords a lot; and users think of passwords as private, so they should be private.

Fortunately, there’s a really easy way to avoid storing plaintext passwords: Instead of storing the password, compute some function from the password and store that. For example, instead of storing the actual password, Bob can convert all the characters in the password to numbers (something like ‘1’ -> 1, ‘2’ -> 2, …, ‘a’ -> 100, ‘b’ -> 101, etc) and then multiply all of these numbers together. We call this value the “hash” of the password. Bob can then store the hash in his database instead of storing Alice’s password. When someone tries to log in claiming to be Alice, Bob computes the hash of the password that this person enters, and if that matches the stored hash, the password is probably correct.

Obviously, this idea has a few problems:

  1. The notion of “probably correct” is a bit unsettling. Indeed, if we use this particular hash function, then “terrace” would match “caterer” (They’re anagrams of each other), and any adding sequence of ones to any password wouldn’t change the hash. It’s even possible to imagine someone who has no clue what Alice’s password is just happening to guess a different password with the same hash.
  2. Given any hash, it’s really really easy to come up with a password that matches that hash. (All you have to do is factor a number that is the product of many small factors.) That means that if some employee of Bob’s checks Bob’s database, then he can easily log in as Alice, even if he can’t figure out the password itself.
  3. Following that line of reasoning, it’s not too hard to guess the actual password from the hash.
So, that makes this specific hash function pretty useless. It’s still probably better than storing the password directly, but only because it makes a potential eavesdropper do a bit of extra work. Plus, it has the serious drawback that a lot of passwords have the same hash.

In practice, computer scientists have developed a lot of different cryptographic hash functions that have very nice properties:

  1. It’s very unlikely that two passwords have the same hash. For modern hash functions, the odds of two strings having the same hash value are typically less than one in 10^38.  That way, it becomes extremely unlikely (effectively impossible) for anyone to happen to guess a password that has the same hash as Alice’s.
  2. Similar strings tend to have completely dissimilar hashes. For example, here are the SHA-1 hashes of terrace and Terrace: 2d5f4f207948db497f428beb0e1fce10f13df1e0 and daf288704dd9a538884b308d2d39b1e9fa19337e. That way, an eavesdropper can’t look on Bob’s computer and learn that two accounts use similar passwords.
  3. Even if I know the hash of Alice’s password, it is effectively impossible to figure out ANY password that has the same hash as Alice’s let alone Alice’s actual password. That way, even if someone has access to Bob’s computer, he still can’t just enter Alice’s username and some password and get access to her account.

So, Bob takes the hash of Alice’s password. That way, he doesn’t have to store the real password, so people with access to his computer can’t figure it out. But, he still gets the security benefits of using a password scheme.

(Hash functions are actually really useful for lots of fun things. For example, here’s the hash of a secret that I know: 54af0a292ec28dfebb5f5142b3afc187a410c11a. In the future, I might want to prove that I knew this now. So, I can post that publicly on my blog without fear that anyone will figure out what the secret is, and if I later want to prove that I knew it, I can just dig up the original message and show that it has the same hash as the message I posted. Cool, huh?)

Step 3: Needs Some (Randomized) Salt

There’s still a problem with this scheme, though.

Suppose Bob runs a poker site, and his computer stores a lot more than just one password. Say he’s got two million or so. Some of these passwords are likely to be really common, like ‘12345’ or ‘password123’ or whatever. If a hacker or site employee gains access to this information (which happens a lot…), he can simply calculate the hashes of a long list of common passwords (say the million most common passwords) and compare his calculated hashes with the information on Bob’s server. In that way, he’d likely end up getting a lot of passwords. Even worse, if Bob used a common hash function (which he’d probably want to do to make sure he’s using something that’s secure), then he can actually look up extremely large precomputed tables of hashed common passwords online, called rainbow tables.

There’s a very simple partial solution to this: Simply take a bunch of characters, called a salt, like “Hi, my name is Bob, and this is my poker site!” and add them to be beginning of everybody’s password before hashing it. (In practice, you would probably use a very long number instead of a sentence.) That will make precomputed rainbow tables completely useless, since it’s unlikely that anyone happened to create a rainbow table for passwords starting with “Hi, my name is Bob…” And, if a hacker doesn’t know the salt, then he won’t be able to generate a useful table either.

But, this is only a partial solution because it still doesn’t stop a hacker who knows the salt. If a hacker managed to get access to Bob’s list of passwords, it’s not much of a stretch to think that he might get access to the salt as well. Then, all he’d have to do is generate one big rainbow table with the million most common passwords, hashed using Bob’s salt. It’s pretty likely that somebody on the server will have used one of the million most common passwords, so the hacker could still learn people’s passwords.

The solution is to randomize the salt. Every time someone creates an account with Bob, Bob generates a random number, adds it to the beginning of the new user’s password, and takes the hash of the combination of the two. Bob then stores on his computer the random number and the hash. So, when Alice logs in, she sends her password (encrypted via the SSL protocol, of course) to Bob, Bob looks up the random number that he has stored for Alice, adds this number to the beginning of the password that Alice sent him, hashes the combination, and checks if it matches the hash in the system. If it does, then Alice has almost certainly entered the right password, and if it doesn’t, then she has not.

Now, if a hacker gets access to the list of hashed passwords, he needs access to the randomized salts as well to do anything useful with that. If he gets this, the hacker has to generate a distinct rainbow table for each salt. In other words, if my randomized salt is “Hi, my name is Noah” and Alice’s is “Alice is awesome,” then, if the hacker wants any shot at figuring out my password, he’d have to generate a bunch of hashes of strings that begin with “Hi, my name is Noah.” If he wants any shot at figuring out Alice’s password, he’d have to generate a bunch of hashes of strings that begin with “Alice is awesome.”

So, he now has to generate a new table for each user instead of one table for everybody. This makes it much much much less likely that he’ll be able to figure out anybody’s password, so this implementation is pretty damn secure. (Of course, it’s not perfect.)

Fin

So, that’s how basic password security works. Hopefully, some poker sites who don’t currenly implement this (or something similar) will implement it in the near future. And, hopefully the community will start to understand better how their passwords should be handled and demand that they be handled reasonably. Frankly, this stuff is quite trivial to anyone who knows anything about computer security, so the fact that a number of sites have been shown to mess it up (which likely means that even more sites mess it up in practice) is extremely worrisome, both because it compromises people’s passwords and because it suggests that poker sites are not hiring the right people to do security.

As always, if you want to read more stuff by me, you can check out my more nerdy other blog, Solipsist’s Log; follow me on Twitter; and/or subscribe to this blog’s RSS.

Oh, and One More Thing for the Nerds: Client-Side vs. Server-Side

If you’re a nerd, you may have noticed something strange about the above implementation: Bob still gets to see Alice’s password in plaintext at some point. In particular, he gets it sent to him encrypted, and then he decrypts it.

So, why don’t we have Alice hash the password before we send it?

Well, if we do that, then any hacker/employee who gets access to the list of hashed passwords (which, again, happens a lot) could log into anyone’s account trivially. They simply send the server the username and the hashed password. By hashing the password on Bob’s side, a hacker who’s found Alice’s hash still has to find a password that matches the hash (or to somehow trick Bob into accepting a pre-hashed password). That’s a big part of the idea behind hash functions.

But, letting Bob see Alice’s password in plaintext is pretty ugly. If Alice used different passwords for all sites, it wouldn’t be a big deal. But, Alice probably doesn’t do that because very few people do. Plus, there’s a bit of a weird privacy issue, since people think of their passwords as very private things–It’s not too hard to imagine Alice’s password being “I’m secretly in love with Bob,” for example. (This is actually a relatively common practice.)

Fortunately, there’s a simple solution: The password should be hashed twice. First, Alice hashes her password and sends it to Bob. Then, Bob hashes the password using a random salt that he stores locally. (If Alice is always going to log on from the same computer, she can use a locally stored random salt as well.) That way, only Alice knows her original password and Bob still does not store the information needed to login.

Of course, if Alice uses the same password for a lot of different sites, and she uses the same hash for them as well, then this method is still insecure. But, if she uses a different salt for different sites (maybe the salt is the name of the website that she’s connecting to, for example), then this problem goes away. In practice, the website or client software can implement the hash itself and use a reasonably unique salt, so Alice need not worry about these details.

This method is sort of a cheap hack in that it’s trivially equivalent to Alice just using secure passwords. In other words, Alice should use a different, strong password for every site; perhaps she should generate her passwords by using a hash function in the first place. In practice, of course, very few users do this, so this is a nice solution that more sites should implement.

5 Comments.

  1. slight nitpick: you obviously wouldn’t create a table just to see if someone’s password is in the list of often used passwords. you could just compute the hash of every password with the salt and compare it (brute force attack).
    So to increase the security even further you can hash the hash of the password, and so on, a specified number of times (500 is used a lot these days, more is possible) to slow down brute force attacks since hashing a password 500 times takes quite a while (probably to a point where brute forcing even the simplest passwords becomes too time consuming).

    • Noah Stephens-Davidowitz

      x,
      Nice point. I deliberately avoided any discussion of run-time or complexity in general in this post because it got pretty long without it, and I actually want people to read and understand the thing. I suppose I could add another point to my vague definition of a hash function, but meh.

      FWIW, as a theorist, I’m morally opposed to the idea of hashing something 500 times in a row. If your hash function doesn’t provide a few orders of magnitude more security than you need, then you’re using the wrong hash function. A nice slow and unbroken hash function like anything based on the Blowfish cipher is effectively unbreakable with today’s hardware with basic password requirements under standard complexity assumptions. In practice, I suppose if you want to run SHA 512 times, that seems fine, though I’m not immediately aware of the literature about the composability of SHA (though I’m sure it exists).

      Since we’re being nitty, I should mention that the best way to deal with passwords is with a zero knowledge proof scheme. That way, the password never leaves the user’s computer in any form.

  2. Hello,
    sorry for offtopic, but i have to ask u if u could do a article about “Life as a NLHE 9-max Cash Game Pro by the Numbers”.
    I can’t find on the internet better research then u did (about 6-max, mtt, sng).
    Best Regards!

    • Noah Stephens-Davidowitz

      satriani,
      You’re not the only one who’s asked for that. It’s still technically on my to-do list, but frankly, it seems like a rather boring task to me right now, so I don’t know that I’ll ever get to it. I’m aware that people want it, though.

  3. Noah,
    i just wanna know the number about chance of loss and chance of 0,5 equity depend of your winrate. Thats all.
    It would be very very helpfull. If you couldnt do that, can u just say how u make previous estimation – 6max ?

    Best Regards