Exyr.org Subscribe

Hashing passwords the Right Way

Simon Sapin,

Short answer: I chose PBKDF2, but bcrypt and scrypt are good too. In any case, use a constant-time comparison to avoid timing attacks. Find the code below.

So you’re writing some application where users log in with passwords. As you already know (or so I hope!), you should not store passwords in plain text. There is plenty already on the why, so I’ll focus on the how.

The precautions I’m taking here may seem overkill, but they’re really easy to implement so there is no reason not to.

The scenario we want to protect from is an attacker getting access to the database. Of course you should try to prevent that in the first place, but it has happened before. Traditional, reversible cryptography does not help: if the database was stolen, the key may have been stolen too. Thus, we need a one-way hash.

SHA-256 is not known to be broken at the time of this writing, while MD5 and SHA-1 are.

The basic idea is to store hash = sha256(password) instead of the password itself. However this method vulnerable to Rainbow Table attacks where a hash is compared against a big table of pre-computed hashes. Creating such a table is expensive (takes time), but after that it can be used for all entries in the database. A salt protects against this: hash = salt + sha256(salt + password). Salts should be 8 bytes or longer, unique and random.

Salting is good but it does not protect against dictionary and brute force attacks. This is because SHA-256 and similar hashing functions are designed to be fast.

To make it slower we could just apply SHA-256 to its own result many times. This number of application is the work factor. When computers become faster next year we can increase the work factor to keep up with Moore’s law. However, the first rule of designing a crypto-system is: Don’t.

PBKDF2, bcrypt and scrypt are all reputable hashing functions with a work factor. See this answer to compare them. I chose PBKDF2 since it is easiest to implement in Python and I judged it good enough based on what I read about it. There are several implementations of PBKDF2 for Python out there. I picked Armin Ronacher’s for its simplicity (15 lines of actual code).

One more thing where we need to be careful is timing attacks. If we use the == operator to check a hash, it will compare byte-by-byte and return on the first inequality. This timing leaks information to the attacker who could guess the correct hash without even getting access to the database. This attack is more plausible than one might first think. But even if it is not, a constant-time comparison is easy (See the the code below.) so there is no reason not to do it.

To wrap up: PBKDF2 with random, unique salts and constant-time comparison.

Finally, it is useful to store the algorithm parameters such as the work factor next to the hash and its salt. That way, if we later want to change these parameters we don’t need to throw away old hashes and reset everyone’s password. Old hashes can stay valid and be re-created on next log in.

We need some code to tie all this together and it’s pretty generic. The whole point of this article is actually to provide it. This is the code I use in production:

import hashlib
from os import urandom
from base64 import b64encode, b64decode
from itertools import izip

# From https://github.com/mitsuhiko/python-pbkdf2
from pbkdf2 import pbkdf2_bin

# Parameters to PBKDF2. Only affect new passwords.
HASH_FUNCTION = 'sha256'  # Must be in hashlib.
# Linear to the hashing time. Adjust to be high but take a reasonable
# amount of time on your server. Measure with:
# python -m timeit -s 'import passwords as p' 'p.make_hash("something")'

def make_hash(password):
    """Generate a random salt and return a new hash for the password."""
    if isinstance(password, unicode):
        password = password.encode('utf-8')
    salt = b64encode(urandom(SALT_LENGTH))
    return 'PBKDF2${}${}${}${}'.format(
        b64encode(pbkdf2_bin(password, salt, COST_FACTOR, KEY_LENGTH,
                             getattr(hashlib, HASH_FUNCTION))))

def check_hash(password, hash_):
    """Check a password against an existing hash."""
    if isinstance(password, unicode):
        password = password.encode('utf-8')
    algorithm, hash_function, cost_factor, salt, hash_a = hash_.split('$')
    assert algorithm == 'PBKDF2'
    hash_a = b64decode(hash_a)
    hash_b = pbkdf2_bin(password, salt, int(cost_factor), len(hash_a),
                        getattr(hashlib, hash_function))
    assert len(hash_a) == len(hash_b)  # we requested this from pbkdf2_bin()
    # Same as "return hash_a == hash_b" but takes a constant time.
    # See http://carlos.bueno.org/2011/10/timing.html
    diff = 0
    for char_a, char_b in izip(hash_a, hash_b):
        diff |= ord(char_a) ^ ord(char_b)
    return diff == 0

As usual, the code is BSD licensed (as is Armin’s PBKDF2 implementation) and available by itself.