Store your users' passwords

tbowan
(en français)
23 décembre 2019

Spoiler: As soon as you need to control user access, you have to authenticate them and even if alternatives exist, the couple username + password remains a simple and relatively effective method. Today, we are going to see how to secure the storage of this sesame (by hashing them after having salted them and by taking your time, in short, what password_hash() does in PHP).

Today, you have decided to develop your own application and need to verify the identity of your users… We have all been there. And of all the possible solutions, you opted for password authentication (no one is judging you).

To prevent a spy from reading the passwords, you have therefore created certificates and set up a secure connection with HTTPS. To make passwords difficult to guess, you have even set up a password policy (size, required characters, etc.) and anti brute force protections.

And now you're wondering what else to do… You've come to the right place 😉.

Why do more?

Effectively, your defensive perimeter is protected. An outside attacker cannot intercept or guess passwords.

The problem, generally speaking, is that you can't be 100% sure that no vulnerabilities exist elsewhere in your system. The potential flaws are numerous and even if you put all your energy into it, there is always a small probability that something will remain (even just an insider).

The news is full of examples of these data leaks in companies of all sizes. Whatever the means implemented, a (small) door was found and allowed to export the base. If your passwords are in clear, you expose your users to identity theft.

Even if you consider the exploit unlikely, you must still assume that an attacker will obtain access inside the perimeter and, in fine, will read the contents of your database with, among other things, the passwords .

By placing a second protection inside your perimeter, you are doing what is called defense in depth and when it comes to security, it is always a good idea.

Make unreadable…

So the idea is to make passwords unreadable and if you're thinking about cryptography, you're on the right track.

Why not encrypt?

Counterintuitively, encrypting passwords is not going to help us. Of course they will be unreadable but whoever says encryption says decryption key that must be stored somewhere... If you encrypt it, then you need a second key, which must also be protected... The system is more and more complex and therefore fragile.

We find this kind of construction in applications that need to read the data after having stored them, including certain credit card payment applications (cf. conditions 3.4 to 3.6 of the PCI-DSS standard). In this case, we then speak of DEK and KEK:

DEK for Data Encryption Key, a key to encrypt data,

KEK for Key Encryption Key, a key to encrypt keys.

And if you start on this kind of system, you will have to protect the KEK, either with another KEK (and back to step one), or on other media (i. e. HSMs). It is more and more complex and you will come to celebrate Ceremony of Keys, and a substantial part of the budget will be devoted to candles, incense and other pointed hats.

Fortunately, since we don't need to read the passwords again, another cryptographic method exists, the hash function. But as we will see, you have to choose them wisely and add salt.

Hashing

The first pitfall is in the choice of the hash function. This must be robust to a pre-image attack, in other words, an attacker with the hash of the password should not be able to find it more easily than by exhaustively testing all the passwords (we also say by brute forcing).

Forget md5 and sha1 forever! These functions are broken, and have been for far too long.

When registering your users (or when they change their password), you will compute a cryptographic fingerprint and it is this fingerprint that will be stored. For example, here is a first draft of a function to make the password unreadable:

function transform($password) {
     return hash("sha512", $password);
}

During the authentication of your users, when they provide you with their password, you will compute the fingerprint and compare it with that in the database. Here's a first draft of what it would look like:

function verify($password, $transformed) {
     return transform($password) == $transformed;
}

New pitfall here when comparing strings. Usually, to save computation time, we stop comparing as soon as we find a different character in the two strings (it avoids comparing those that follow). In our case, this is problematic because by measuring the time taken to respond, an attacker will have information on the length of the common prefix between the hash stored in the database and that of the password that he submits. By tests and errors, he will be able to find the hash in the database (and therefore the password of your user).

To do things right, we will therefore use the function hash_equals() to compare the hashes because it is designed specifically to take the same time, no matter when the fingerprints differ. It is longer but security is at this price.

function verify($password, $transformed) {
     return hash_equals(transform($password), $transformed);
}

Salt

If we leave it here, using a single hash function, your passwords will all be transformed in the same way. Two identical passwords from different users will have the same hash. And that's going to be a problem.

Why ? Dictionaries

Rather than breaking the passwords by exhaustively trying all the possibilities (it can take a lot of time), we could use a dictionary. By storing the possible passwords with their fingerprint, you will just have to find the fingerprint of a user in the dictionary and then read the corresponding password.

For example. If we compute all the SHA256 hashes of passwords of 6 alphanumeric characters, our dictionary will require $62^6$ (number of possible passwords) times 32 bytes (size of a hash), i.e. 1.8 TB which is largely acceptable with modern storage.

If the exhaustive search of all passwords has a complexity of $O(n)$ , the search in a tidy dictionary drops to $O(log(n))$ . The initial investment to create the dictionary will thus be profitable from the second password to crack.

Some might think that it is enough to impose constraints on the passwords to make these dictionaries impossible to store, but this is a mistake because we have found tricks to circumvent these storage limitations.

Frequent passwords. Most users lack originality and end up using, again and again, the same passwords. You could then create a dictionary with these passwords and save disk space. Of course, you won't break the really complicated stuff but you should have a good success rate on a complete database.

Besides, sometimes you don't even have to create the dictionary or even do the search yourself because they may be indexed (i.e. thanks google).

Rainbow tables. Without going into details, this is a trick for compressing a dictionary. Rather than storing the fingerprints of all the passwords, you can actually discard a good proportion of them which you can actually recalculate.

This is an example of a time/memory trade-off. Compared to dictionaries, tables take up less space but require more time to find a password. Compared to an exhaustive attack, the tables take up more memory but remain faster.

To counter these dictionary attacks, it is therefore necessary to find a way to transform two identical password into two distinct fingerprints...

How ? Add randomness

To make the result of the hash vary from one password to another, we will add a salt to it; i.e. random characters, different for each user.

Indeed, if the salt was common to the whole database, or if it was derived from the password, an attacker could create a dictionary specific to your storage algorithm, making the salt at most uncomfortable for the attacker.

When registering your users (or when they change their password), you will therefore generate a random string (the salt) and add it to the password then compute a cryptographic fingerprint of the whole. It is the salt and the fingerprint that will be stored.

function transform($password) {
     $salt = bin2hex(random_bytes(16)); // 128 random bits
     $hash = hash("sha512", $password. $salt);
     return [ $salt, $hash ];
}

During the authentication of your users, when they provide you with their password, you will extract the salt, attach it to the password and hash the whole thing to compare their fingerprint with the stored one.

function verify($password, $transformed) {
     list($salt, $hash) = explode(".", $transformed);
     return hash_equals(hash("sha512", $password. $salt), $hash);
}

To slow down

So far, we have guaranteed that an attacker can only attempt an exhaustive retries attack. This is very good, but one last problem arises: the speed of fingerprint computation.

Why ? GPUs

Basically, most hash functions were built to go fast. They were designed to perform integrity checks and in this area, we appreciate that these computations are done quickly so that the processor can move on to something else quickly.

To make matters worse, clever little guys have found a way to do these fingerprint computations on graphics cards (abbreviated GPU) which, for this particular type of calculation, break all records because they can compute plenty of them in parallel.

In our case, this speed turns against us since if we can go fast to compute a fingerprint (with our processor), a attackers will go even faster (with theirs graphics cards)…

To give an idea, on our GTX 1050 TI graphics card released in October 2016, hashcat computes 130 million sha512 per second. For a password of 6 alphanumeric characters, it will only take it 7 minutes to test them all.

In fact, after "protecting" the aze123 password with the previous transform() function, it only took 2 seconds for hashcat to break the fingerprint.

After 2 seconds, Hashcat has cracked the password

For the curious, the command line to type is a bit technical…

hashcat64.exe -a 3 -w 3 -m 1710 -p . -1 ?l?u?d <hash>.<salt> ?1?1?1?1?1?1

-a 3 to request a brute force (not a dictionary attack),
-w 3 to ask it to go faster (to the detriment of power consumption),
-m 1710 to tell it that the hash is computed by concatenating the password to the salt (in this sense, if we invert them, it's 1720),
-p . to tell it that we separate the salt from the hash with a . (note that hashcat expects to see the hash and then the salt, so I had to adapt the output of my function transform),
-1 ?l?u?d to tell it that I am creating a specific set containing lowercase (?l), uppercase (?u) and numbers (?d)
<hash>.<salt> should be replaced with hash and salt,
?1?1?1?1?1?1 to tell him that the password is 6 alpha numeric.

How to slow down

It is therefore necessary to slow down these attacks by using slower hashing algorithms. If it's slower for us, it's not very serious since it's not an operation that we do often. On the other hand, it will penalize the attacker since he only does that to find the passwords.

Rather than SHA512, it is then more relevant to use other functions such as bcrypt, scrypt, argon2 or even PBKDF2 which are designed to take their time and even provide a parameter to setup the comutation cost. Note that, just like salt, the cost must also be stored next to the fingerprint so that it can be recomputed later.

In PHP, one might want to define a hash function similar to bcrypt using the crypt function as follows:

function mySlowHash($password, $salt, $cost) {
     $options = sprintf('$2a$%\'.02d$%\'.22s$', $cost, $salt);
     return crypt($password, $options);
}

During registration the method does not change, we compute a salt and then a fingerprint.

function transform($password) {
     $salt = bin2hex(random_bytes(16));
     $hash = mySlowHash($password, $salt, 10);
     return $hash;
}

When authenticating your users, you can use the stored version as an option for crypt() to recompute the hash.

function verify($password, $transformed) {
     $hash = crypt($password, $stored);
     return hash_equals($transformed, $hash);
}

And now ?

If you're developing an app, frankly, I wouldn't recommend reinventing the wheel like I did in this article. It was just to show you why and how.

Not only is doing things yourself a potential error trap when it comes to cryptography, but also (mainly?) because you would also have to deal with the notion of updating the algorithm (what would you if you need to increase the cost of the hash? the length of the salt? or the algorithm?).

Because PHP is a serious and practical language, we have the two functions password_hash() and password_verify() which do exactly what we need:

They will hash the passwords using the most suitable functions of the moment,
They will salt passwords, using safe random generation,
They will perform their operations in constant time to avoid temporal attacks,
They will take care of formatting issues to run smoothly.

I should actually replace the two functions transform() and verify() with these two:

function transform($password) {
     return password_hash($password, PASSWORD_DEFAULT);
}

function verify($password, $transformed) {
     return password_verify($password, $transformed);
}

For other languages, you may not have similar functions natively. I would tell you to switch to PHP but I know it's not always possible 😉. So here are some tips...

C/C++: you can directly go to OpenBSD source or the openwall version but nothing will ever be simple…
Python: the bcrypt library offers equivalence functions bcrypt.hashpw() and bcrypt.checkpw(),
Java: Spring Security library allows using bcrypt to hide passwords,
bash: the command htpasswd allows to generate it (i.e. with the options bnBC, see i to read stdin and avoid it appearing in the history)
Node.js: The bcrypt on NPM library allows using a similar hash() and compare() function.