Error Correcting Storage of Secrets in Human Memory
In the past a number of different schemes have been proposed for the storage of encryption secrets in human memory. The vast majority of these schemes rely on the idea of asking the user to memorize a passphrase, often a string of randomly generated words, and then running this passphrase though a key derivation formula, often a hashing algorithm, to deterministically generate an encryption key. This method has a number of weaknesses. Whenever any sort of error is introduced, the key is drastically changed. This has the benefit of preventing calculation of the passphrase from the encryption key. The problem with this is it causes a minor error in human memory to lead to a completely different encryption key. If one were to: forget a part of the passphrase, input the string of words in the incorrect order, or misspell one of the words the generation scheme would fail. All of these errors are extremely likely to occur when humans attempt to memorize a series of words.
A new RSG compatible passphrase can be generated by following this general list of steps:
Randomly generate a number between 0 and 5120
$randomNumber % 512
This is done to decrease the range of possible random numbers to 0 to 512. A random number in this range was not calculated originally since it allowed an attacker to figure out the state of python’s RNG from only part of the password.
Calculate what word from the PGP wordlist corresponds to that random number
Add that word to the passphrase variable.
Add a space to the passphrase variable
Repeat steps 1 through 5 until a passphrase with enough entropy has been generated.
The corresponding checksum for a given passphrase can be generated by following this general list of steps:
Split the passphrase into a list of strings, each of which is a single word from the PGP wordlist
Calculate the place of each word in the wordlist
Add this number to the sum variable
Repeat steps 1 through 3 until you have iterated through all words
Encode $sum % 512 into a word from the PGP wordlist, this is the checksum
Authentication and Key Generation
The Recoverable Secret Generator, or RSG for short, allows all of these errors to occur while still allowing the passphrases with high entropy. RSG allows for automatic correction of: forgotten words, incorrect ordering of the words, and misspelled words.
Incorrect ordering of the words is solved by simply requiring the passphrase to be alphabetized. This means that whenever the passphrase is supplied, the key generation algorithm must first alphabetize the input.
Misspelled words can be corrected through Levenshtein distance matching. The key generation algorithm first breaks the passphrase into a list of words. Each word is then checked against the list of words used in the generation of the passphrase. Any words that are not found in the list of possible words, are marked. Then the program computes the Levenshtein distance between the incorrect word and every word in the wordlist. The word that the incorrect word is closed to according to the Levenshtein distance is then stored in the place of the incorrect word.
Correcting for forgotten words must be done as the final step in the key generation algorithm. A checksum of the passphrase is generated (as described in the Passphrase Generation section) and compared to the user supplied passphrase checksum word. If the calculated checksum is the same as the user supplied checksum,then no words have been forgotten. If the checksums do not match, the difference between the user supplied checksum and the calculated checksum is calculated. The difference is then encoded as a word, which is the forgotten word.
See a python implementation of this for key generation and general authentication against a hash here. Implementations in other languages are welcome!