summaryrefslogtreecommitdiff
path: root/bip-0039.mediawiki
blob: 27a0499c7bcd138f4a0ba835000fe089bbd2acd7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
<pre>
  BIP:     BIP-0039
  Title:   Mnemonic code for generating deterministic keys
  Author:  Pavol Rusnak <stick@gk2.sk>
           Marek Palatinus <info@bitcoin.cz>
           Aaron Voisine <voisine@gmail.com>
  Status:  Draft
  Type:    Standards Track
  Created: 10-09-2013
</pre>

==Abstract==

This BIP proposes a scheme for translating binary data (usually master seeds
for deterministic keys, but it can be applied to any binary data) into a group
of easy to remember words also known as mnemonic code or mnemonic sentence.

==Motivation==

Such mnemonic code or mnemonic sentence is much easier to work with than working
with the binary data directly (or its hexadecimal interpretation). The sentence
could be writen down on paper (e.g. for storing in a secure location such as
safe), told over telephone or other voice communication method, or memorized
in ones memory (this method is called brainwallet).

==Backwards Compatibility==

As this BIP is written, only one Bitcoin client (Electrum) implements mnemonic
codes, but it uses a different wordlist than the proposed one.

For compatibility reasons we propose adding a checkbox to Electrum, which will
allow user to indicate if the legacy code is being entered during import or
it is a new one that is BIP-0039 compatible. For exporting, only the new format
will be used, so this is not an issue.

==Rationale==

Our proposal is inspired by implementation used in Electrum, but we enhanced
the wordlist and algorithm so it meets the following criteria:

a) smart selection of words
   - wordlist is created in such way that it's enough to type just first four
     letters to unambiguously identify the word

b) similar words avoided
   - words as "build" and "built", "woman" and "women" or "quick" or "quickly"
     not only make remembering the sentence difficult, but are also more error
     prone and more difficult to guess (see point below)
   - we avoid these words by carefully selecting them during addition

c) sorted wordlists
   - wordlist is sorted which allow more efficient lookup of the code words
     (i.e. implementation can use binary search instead of linear search)
   - this also allows trie (prefix tree) to be used, e.g. for better compression

d) localized wordlists
   - we would like to allow localized wordlists, so it is easier for users
     to remember the code in their native language
   - by using wordlists with no colliding words among languages, it's easy to
     determine which language was used just by checking the first word of
     the sentence

e) mnemonic checksum
   - this leads to better user experience, because user can be notified
     if the mnemonic sequence is wrong, instead of showing the confusing
     data generated from the wrong sequence.

f) seed stretching
   - before the encoding and after the decoding the input binary sequence is
     stretched using a symmetric cipher (Blowfish) in order to prevent
     brute-force attacks in case some of the mnemonic words are leaked

==Specification==

<pre>
Our proposal implements two methods - "encode" and "decode".

The first method takes a binary data which have to length (L) in bytes divisable
by four and returns a sentence that consists of (L/4*3) words from the wordlist.

The second method takes sentences generated by first method (number of words in
the sentence has to be divisable by 3) and reconstructs the original binary data.

Words can repeat in the sentence more than one time.

Wordlist contains 2048 words (instead of 1626 words in Electrum), allowing
the code to compute the checksum of the whole mnemonic sequence.
Each 32 bits of input data add 1 bit of checksum.

See the following table for relation between input lengths, output lengths and
checksum sizes for the most common usecases:

+--------+---------+---------+----------+
| input  |  input  | output  | checksum |
| (bits) | (bytes) | (words) |  (bits)  |
+--------+---------+---------+----------+
|   128  |    16   |    12   |     4    |
|   192  |    24   |    18   |     6    |
|   256  |    32   |    24   |     8    |
+--------+---------+---------+----------+
</pre>

===Algorithm:===

<pre>
Encoding:
1. Read input data (I).
2. Make sure its length (L) is divisable by 64 bits.
3. Encrypt input data 1000x with Blowfish (ECB) using the word "mnemonic" as key.
4. Compute the length of the checkum (LC). LC = L/32
5. Split I into chunks of LC bits (I1, I2, I3, ...).
6. XOR them altogether and produce the checksum C. C = I1 xor I2 xor I3 ... xor In.
7. Concatenate I and C into encoded data (E). Length of E is divisable by 33 bits.
8. Keep taking 11 bits from E until there are none left.
9. Treat them as integer W, add word with index W to the output.

Decoding:
1. Read input mnemonic (M).
2. Make sure its wordcount is divisable by 6.
3. Figure out word indexes in a dictionary and output them as binary stream E.
4. Length of E (L) is divisable by 33 bits.
5. Split E into two parts: B and C, where B are first L/33*32 bits, C are last L/33 bits.
6. Make sure C is the checksum of B (using the step 5 from the above paragraph).
7. If it's not we have invalid mnemonic code.
8. Treat B as binary data.
9. Decrypt this data 1000x with Blowfish (ECB) using the word "mnemonic" as key.
10. Return the result as output.
</pre>

==Test vectors==

See https://github.com/trezor/python-mnemonic/blob/master/vectors.json

==Reference Implementation==

Reference implementation including wordlists is available from

http://github.com/trezor/python-mnemonic