sharing secrets

In some situations you have a need to share secret/information between people, but you require that it should only be known to them based on some event. Putting aside all the non-technical means of doing this, most common way people try to solve this problem is using encryption. One person gets the key the other person receives the encrypted data. Problem with this is that in theory it is possible that the person who got the encrypted data will be able to brute force the key. Even if you say you trust somebody today, people do change, people like to know secrets. It would be much nicer to split the data in such a way that there will be not enough information given to a single person to even in theory recover the secret. 

There are couples of ways to do it, here I will show two. First will be a simple one time pad, second will be Shamir's Secret Sharing.

One time pad

First let's see our secret message:
> cat secret.txt
this is the most important secret on this planet,
keep is secret, keep it safe
> ls -l secret.txt
-rw------- 1 user user 80 2012-07-10 10:00 secret.txt
the size of this file is 80 bytes, we need this to generate the pad:
> dd if=/dev/urandom of=pad.dat bs=1 count=80
80+0 records in
80+0 records out
80 bytes (80 B) copied, 0.000284124 s, 282 kB/s
> ls -l pad.dat
-rw------- 1 user user 80 2012-07-10 10:06 pad.dat
just to confirm that we got some kind of random stuff inside this file:
> od -c pad.dat
0000000   [   h   "   q  \b   í   /   z   Ű 231   ˙   a   Ă 202   V   Ş
0000020   š  \n   9   ¤   f   G 002   n   ?   Ş   ) 211   C 230   Ř 022
0000040   ú 226   Q 036   ç   ý 004 226 203   S   °   O 026   ů 025   s
0000060 216   Ň 211   ü 177   q 235 023   \   ŕ   Ş   ß   w   e   ¨   ĺ
0000100   o   9   C   Ý   ř   g   x   ň 005   r   6   Q 001   D   Ź   ~
0000120
now, we run a very simple script that will xor data from both files with each other and output result to STDOUT. I will redirect the result to a file named output.dat:
> ./filexor.py secret.txt pad.dat > output.dat
> ls -l output.dat
-rw------- 1 user user 80 2012-07-10 10:13 output.dat
> od -c output.dat
0000000   /  \0   K 002   ( 204   \   Z   Ż   ń 232   A   Ž   í   %   Ţ
0000020 231   c   T   Ô  \t   5   v 017   Q   Ţ  \t   ú   &   ű   Ş   w
0000040 216   ś   >   p   Ç 211   l   ˙   đ   s   Ŕ   #   w 227   p  \a
0000060   ˘   ň 203 227 032 024   í   3   5 223 212   Ź 022 006   Ú 200
0000100 033 025   c   ś 235 002  \b   Ň   l 006 026   "   `   "   É   t
0000120
just to confirm that we used the correct secret.txt file:
> od -c secret.txt
0000000   t   h   i   s       i   s       t   h   e       m   o   s   t
0000020       i   m   p   o   r   t   a   n   t       s   e   c   r   e
0000040   t       o   n       t   h   i   s       p   l   a   n   e   t
0000060   ,      \n   k   e   e   p       i   s       s   e   c   r   e
0000100   t   ,       k   e   e   p       i   t       s   a   f   e  \n
0000120
OK, so right now we have two files pad.dat and output.dat, you can give one file to one person the other file to another and there is no way for them to guess the secret. How to recover the secret message? Simple: xor the two files together again:

> ./filexor.py output.dat pad.dat
this is the most important secret on this planet,
keep is secret, keep it safe

OK, this method is nice if you only have two people that you would like to share secret with. It gets a bit problematic when you would like to allow n people out of m to be able to recover the secret. It is still possible to do this, but it requires a lot of juggling with the files. The good thing about this method is that there is virtually no limit on the size of the secret message.

Shamir's Secret Sharing

If you require something more flexible then the one time pad, fortunately for you a guy called Adi Shamir realised that you could use polynomials to do exactly what you want. The idea is simple (like most ideas that are already discovered) - to find the equation for polynomial of degree n you need n+1 points. If you like to share a secret between m people and from those n people (n<=m) should be enough to recover the secret you need to create a polynomial of a degree n-1 and generate m points on it. Where do we "hide" the secret in the polynomial? The secret is the free coefficient. To recover the secret we need to calculate Lagrange polynomial based on the given points and x=0. That is the theory, now practice:

Converting text to number is easy (python cli):

>>> int("this is a very simple text".encode('hex'),16)
187060217333970177770122667038844020393030967291702967972231284L
>>> hex(187060217333970177770122667038844020393030967291702967972231284).replace("0x","").replace("L","").decode('hex')
'this is a very simple text'
>>>

The rest is a bit too big to fit here, but the script works like this:

First some help to know what options we will be using.
 ./sss.py <options>

  Usage:

  This program implements in a very simple and basic way
  Shamir's secret-sharing scheme:
  https://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing

  -h      - this screen,
  -e msg  - generate secrets for msg,
  -d      - recover secret based on the data in file (-f),
  -f file - file to either write secrets or read them,
  -a num  - overall number of secrets (not less then -r),
  -r num  - required number of secrets (min 2),


let's share a secret message between ten people but any three of them should be enough to recover it:

> ./sss.py -e "this is the secret message" -a 10 -r 3 -f output.dat
> cat output.dat
HE2DENJSGQ2TINZXGA4DUNJSHAZTANZSHEYDAMBVGI2DIMBVHA4DAMBYGUZDMNBSHAZTONRXGA2DINRQGMZTQMRQGE4TONBTGE3TQMBYGM4TINBZGI4TMMRRGIYTINJQGAYDAOBXHAZDGMJZGY3TGMBTG43TMNJXGA2A====
GIZTEMZTGEYDSNBRGAZTUMZSGEYDAOBQGYYTCOBVGI3TSNZXGQ4TQMJQHE4TKMBQGU4DMNJYGE2TSMRYHE2DIOBRGQ4TGNRRHEZTGMRYGE3DENJQGU4TQMRQGAZDQMZTHEZTGMZWGU2TSOJQG43DKMBSG4YDCMJTHE======
G4ZDMOJYG42TQMJXHA2DUMZRGQZTANZYGY2DMOBRGYYDGMJXHE3DKMRXGAZDOMZRGE2DMNJQGY4DMMBWGIYDONJVGIZTMOJWHAYDANJUGE2TGNRSHEZTIMRSGE4TEOJQGYZTGMJYGQYTAMJYHA4DAMZSHEZTCNZVHA2A====
G4ZDGNBRGY3DCMRSGI4DUMZRGEZDENZWHAZDCMJZGEYTMMZQHA3DCOBWGE4TKNJYG43DAMJWHAYTGMBWGE4TQNBZGI3TGMBVGM2DEOJVGY3TONBYG43DMMJZGI4DANBUG4ZTEMZUG4YDMOBQG44DGNBTHAYDONBSGY2A====
GQYDSNJRGM2TCMZTGM2DUOJZG4ZTENZTG4YTAMJXGM3DIMBZGA2TONBTGYYTSNRSHE3TANZUGMZTONJQG4ZTSMRSGIYTSNJXHA2TGNZXGI2DOMZYGAZTAOBYHEYTKMJRHE2DQOBRGU3TKMBYG44DQOJSGIZTSMJYGQ======
G4ZTKMRQGAZDOMZYGI4TUMZSGE2DIOJTG4ZTINJUGQZDQMBQGY2TGMBTHEYTANRUGY4TKOJQHA2DCOBWGI2TKMBTGU2TEMRTGEZDMNJXGU3TAOJRGM2TSMZSGM3DANRQGQZTSMJSG4ZTKNBYHE2DKNRQHA3TAMBZHE4Q====
GQ2TSMJZGQ4DONRRG4YDUMJSGUZTSOJTGYZDSNBXGYYDEMJYGA3TOMRRGE2DEMZUGQZDMNZZGEZTGMRRGM2TAMZSGU2DCMJYGI4TCNBTHA2DCMRTHEYDKNZSGU3TKNRYHEZDEOBXGA4DOMJXGY3DOMZTGQ4TANRQGA4A====
GQ3TIMBSGM2TKNJSGU3DUMJTGM3DEOJRGIYDIMZRHE3TCNRSHE4TAMJUHEYTOMRWGUYTOMJWHE4TSNJWGU3DSNZTGIZTMNJVGUZDSMBUGUZDKMRWGIYDONJQGYYDKNBUGUYDONBXHE3DSOJTGY3DGMBYGQ3TINRYHE3A====
GY3TENBTG4ZDGNZTGA4DUMRWHA4TAOBWGUYDQMJQHA4DQNBYG4YDIOJVGE2DSOBZGIZTOOJVG43DONZXGQYTAMJUGY3DQMZVGM2TANRRGYYTSNBWGE4DKNRWGQ2TAMZRHE3DQMRWGEYTMOJRG42TMOJRGYZDSMBVGA2A====
GMYTINZQGQ3TIMRSGAZDUNJYHA4TQOJYHA3DONBZGM2DAMRVGM4DIMBVGYZTSMBRG43DSMJWGM2DSMBYGUZDGOBRGQYTMNBWGI4DMMBVGI2DMMBVGE2DGMBUGU4DSOJZHEYDENZTGIZTCNRVGE2DIMJRGE4TEOBYHA======

output is base32 encoded for ease of moving, this is what you get if you decode any of them:
>>> from base64 import *
>>> b32decode("HE2DENJSGQ2TINZXGA4DUNJSHAZTANZSHEYDAMBVGI2DIMBVHA4DAMBYGUZDMNBSHAZTONRXGA2DINRQGMZTQMRQGE4TONBTGE3TQMBYGM4TINBZGI4TMMRRGIYTINJQGAYDAOBXHAZDGMJZGY3TGMBTG43TMNJXGA2A====")
'942524547708:52830729000524405880085264283767044603382019743178083944929621214500008782319673037765704'
>>>

there are two values separated by colon, the first one is x the other is y.

now let's recover it using all of the points:

> ./sss.py -d -f output.dat
this is the secret message

that was easy, now let's see what happens if we only use three points from this set:

> cat output.dat
GQ3TIMBSGM2TKNJSGU3DUMJTGM3DEOJRGIYDIMZRHE3TCNRSHE4TAMJUHEYTOMRWGUYTOMJWHE4TSNJWGU3DSNZTGIZTMNJVGUZDSMBUGUZDKMRWGIYDONJQGYYDKNBUGUYDONBXHE3DSOJTGY3DGMBYGQ3TINRYHE3A====
GY3TENBTG4ZDGNZTGA4DUMRWHA4TAOBWGUYDQMJQHA4DQNBYG4YDIOJVGE2DSOBZGIZTOOJVG43DONZXGQYTAMJUGY3DQMZVGM2TANRRGYYTSNBWGE4DKNRWGQ2TAMZRHE3DQMRWGEYTMOJRG42TMOJRGYZDSMBVGA2A====
GMYTINZQGQ3TIMRSGAZDUNJYHA4TQOJYHA3DONBZGM2DAMRVGM4DIMBVGYZTSMBRG43DSMJWGM2DSMBYGUZDGOBRGQYTMNBWGI4DMMBVGI2DMMBVGE2DGMBUGU4DSOJZHEYDENZTGIZTCNRVGE2DIMJRGE4TEOBYHA======
> ./sss.py -d -f output.dat
this is the secret message

if you leave only two points in the output.dat file:

> cat output.dat
GQ3TIMBSGM2TKNJSGU3DUMJTGM3DEOJRGIYDIMZRHE3TCNRSHE4TAMJUHEYTOMRWGUYTOMJWHE4TSNJWGU3DSNZTGIZTMNJVGUZDSMBUGUZDKMRWGIYDONJQGYYDKNBUGUYDONBXHE3DSOJTGY3DGMBYGQ3TINRYHE3A====
GMYTINZQGQ3TIMRSGAZDUNJYHA4TQOJYHA3DONBZGM2DAMRVGM4DIMBVGYZTSMBRG43DSMJWGM2DSMBYGUZDGOBRGQYTMNBWGI4DMMBVGI2DMMBVGE2DGMBUGU4DSOJZHEYDENZTGIZTCNRVGE2DIMJRGE4TEOBYHA======

we get something like this:
> ./sss.py -d -f output.dat
Traceback (most recent call last):
  File "./sss.py", line 156, in
    print out.decode('hex')
  File "/usr/lib/python2.7/encodings/hex_codec.py", line 42, in hex_decode
    output = binascii.a2b_hex(input)
TypeError: Odd-length string
Exit 1
which tells you that I need to add some error checking ;), but also that two points are not enough to produce any meaningful results in this case :). 

The good thing about this method is its flexibility in choosing how to split and recover the secret. The problem is that you can't split a very large secret using it. The mathematics that creates the secret is only done on integers but the recovery requires floating point operations.