Random pronounceable passwords − Exyr.org

Random pronounceable passwords

Simon Sapin, 2011-02-11

Update 2015-10-12: These days I have one password that I memorize, generated with Markov chains as described below, used for the disk encryption and login of my laptop. Everything else has strong unique passwords saved in my browser’s password manager (I don’t even try to memorize them), generated with:

$ </dev/urandom tr -d -c 'a-zA-Z0-9' | head -c 32; echo
g3liM01F2XYVJBD5bp2q1QWv3zzncZge

That’s 190 bits of entropy, and adding more is really easy when there’s no need to memorize the whole thing!

It is often advised that passwords should should be long (8 characters is considered good) and contain various kinds of characters (not just lower-case letters.) Such a password is stronger against dictionary or brute-force attacks.

The strongest password would be a completely random one. Generating one is quite easy:

$ head -c 12 /dev/random | base64
RU0aq07R9ZVK8LR1

However, such a password is very hard to memorize (at least for me.) It is also not so easy to type.

I find that I remember words (and people’s names!) much better if I know how to pronounce them. They do not have to be pronounced out loud, I just remember the sound it would do more than each individual letter. This means that a “pronounceable” password would be much easier to memorize (at least for me, again.) Most words in most languages are easy to pronounce so we could just pick one, but that’s a very weak password against dictionary attacks. We need something random.

So, what is pronounceable? Real words that are hard to pronounce often have many consecutive consonants. We could just alternate consonants and vowels, that’s easy enough:

import random

def pronounceable_password():
    # I omitted some letters I don’t like
    vowels = 'aiueo'
    consonants = 'bdfgjklmnprstvwxz'
    while 1:
        yield random.choice(consonants)
        yield random.choice(vowels)

print ''.join(itertools.islice(pronounceable_password(), 14))

And a few results:

bonugevazisibe
wobumubidigato
wuxarewuvidiri
zixizuzugurete
mejevefibawuso
figosotufixaza

Not bad, but we can do better (and more interesting!)

The Japanese language is made of a well-known set of syllables (sounds), most of which consist of a consonant followed by a vowel when romanized (written in Latin alphabet.) This is why Japanese is mostly easy to pronounce for westerners, but many foreign words are distorted in Japanese. For example, they use the international word “taxi”, but it’s pronounced more like ta-ku-shi.

Anyway. Using Markov chains, we can generate text that “sounds” Japanese. Markov chains have many interesting mathematical properties but the basics is that they represent a system that transits between states, and the next state depends only on the current state and not the past. In other words, for text, each character has a probability of being chosen that depends on the previous character. To determine these probabilities, we look at pairs of consecutive characters in a sample text.

The algorithm looks like this: (Also see the complete code.)

class MarkovChain(object):
    def __init__(self, sample):
        self.counts = counts = defaultdict(lambda: defaultdict(int))
        for current, next in pairwise(sample):
            counts[current][next] += 1

        self.totals = dict(
            (current, sum(next_counts.itervalues()))
            for current, next_counts in counts.iteritems()
        )


    def next(self, state):
        nexts = self.counts[state].iteritems()
        # Like random.choice() but with a different weight for each element
        rand = random.randrange(0, self.totals[state])
        for next_state, weight in nexts:
            if rand < weight:
                return next_state
            rand -= weight

Again, a few resutlts:

odauarabarikoy
hitarikametata
imarotamenayan
abautiyosihere
ukihumetotarit
womitohinarego

This is subjective, but I like these better. (Could be because I’m learning Japanese.) Maybe considering the 2 or more previous characters instead of just one would yield better results. This is left as an exercise for the reader ;)

This algorithm produces passwords with only lower-case letters which is generally considered a bad idea, but this is compensated by the length. It also makes the password easier to type.

If we mix 26 lower case letters, as many upper case, ten digits and a dozen of other symbols, that’s 72 possible characters. Picking 8 of them at random gives 72⁸ possible passwords, or about 49 bits of entropy. It is possible to calculate the exact entropy for a Markov chain, but the math is non-trivial. I guesstimated that this pseudo-japanese is about the same entropy as alternating 15-something consonants with 5 vowels. So for 14-characters passwords, that’s 15⁷ × 5⁷ possible passwords or about 43 bits of entropy; which I decided was good enough for me.

Now grab the code and go change all those weak passwords!

Note: If you want to use something like this in an automated system (rather than where you cherry-pick a few samples of the output), beware of The Automated Curse Generator.