Engineyard posted a programming
contest on July 14, 2009. The task was to find a phrase whose
SHA-1 hash has minimal Hamming distance from the SHA-1 hash of a
target phrase. To make things more interesting, the phrases were
restricted to words from a 1000-word dictionary, had to consist of 12
words with spacing, and could be followed by at most 5 (free)
printable ASCII characters. The words could have upper and lowercase
letters at any position. Time: 30 hours after start.
The contest started on July 20, 2009 when the target phrase and
contest dictionary were posted.
It's official - winners announced on Engineyard's blog and it's us!!
SHA-1
is a hash function that operates on blocks of 64 bytes at a time. The
output of hashing the first block is used as initialization vector
(IV) for the second block etc.
ASCII characters are
stored in 8 bits (=1 byte), i.e.
BuGStakes 4 bytes; the string
BuGS bugs bUgs BUGs BLAnK BlAnK blANK BlaNK BuGS BUgS buGS bugS FkCjVconsumes exactly 69 bytes (blanks count as characters!).
We are not aware of any weakness of SHA-1 that would help solving this
contest, so the only way is to hit it with computation power and that
means as many tests per second as possible. We reused the first block
as often as possible, i.e. for fixed first 64 bytes vary the end. We
have no chance of exhausting the search space so we went for the
cheapest choice: use 12 words and 12 spaces for the first block, have
the second one consist entirely of the 5 characters. Our choices were
8 words of length 4 and 5 of length 5. Each process would start with a
random choice of upper and lowercase letters. Scanning the dictionary
for 3 minutes after the start we settled on bugs and blanks for the
first runs and dirty and cows for the second.
Taking all combinations of upper and lower case letters for the 52
letters in the first part gives 252 different first
blocks. Chances of repeating any by random guesses are minimal for
only 30 hours. Of course we could have varied the entries more -
changed the order of words for each new process to start and choose
new words of 4 and 5 letters (and of course one can also take other
combinations of lengths) but when each second is worth
3*109 hashes any manual change reduces your chances. The
dirty cows didn't start before we burned a fuse in Taiwan and behemoth
and the angels and giants had to be recabled - lots of time to think
of other fun words.
Each of the first blocks was combined with all combinations of 5
letters for the second block and lots of zeros for padding (processed
faster than content). Actually we didn't use ALL combinations - the
CPU code did 325 (planning ahead for some extra
optimization that didn't happen before the deadline; we didn't start
before Sunday ...) while the GPUs did 323*622,
with each GPU thread varying only the last two characters. 64 would
have been nicer but would lead outside the standard character set
a..zA..Z0..9 and we were worried how well they would work on Twitter.
None less than Daniel
J. Bernstein (currently visiting TU/e) took the time to write
dedicated software for our cluster of Core2 Quad CPUs. The
SHA-1 code will be very useful for our future projects and
outperformed our first quick OpenSSL hack by a factor of 14 on
this.
Dan did also did our CUDA
implementation (good shot for a first time CUDA programmer!).
The code is posted here.
We started all cluster machines within minutes after the challenge came out (needed the dictionary words first). Had to take down hautbrion after a short while because its CPU was overheating. All other 9 machines stayed with us through the run and worked through about 47*106 hashes per second, working on all 4 cores.
Here is the table of the complete computation power we threw at this. Some updates on owners and grants to thank will follow shortly.
tesla: | 690822188 hashes/second: | 4 GT200 GPUs in a S1070 (two buses) |
behemoth: | 657871796 hashes/second: | 4 GT200b GPUs on 2 GTX 295 cards |
unicorn: | 337020845 hashes/second: | 4 G92 GPUs on 2 9800 GX2 cards |
dragon: | 220369116?hashes/second: | 2 GT200 GPUs on 2 GTX 280 cards |
centaur: | 179221757 hashes/second: | 2 G92 GPUs on 2 8800 GTS cards |
giant0: | 106109452 hashes/second: | 8 C2+ 2.66GHz cores |
giant1: | 106109452 hashes/second: | 8 C2+ 2.66GHz cores |
movebank: | 94117640 hashes/second: | 8 C2+ 2.8GHz cores |
angel0: | 69961977 hashes/second: | 8 K10+ 2.3GHz cores |
angel1: | 69961977 hashes/second: | 8 K10+ 2.3GHz cores |
colossus: | 68927875 hashes/second: | 16 K8 2.2GHz cores |
berlekamp: | 51981651 hashes/second: | 4 C2+ 2.83GHz cores |
lanczos: | 49740932 hashes/second: | 4 C2 2.4GHz cores |
giscours: | 47603960 hashes/second: | 4 C2 2.4GHz cores |
moutonrothschild: | 47603960 hashes/second: | 4 C2 2.4GHz cores |
latour: | 47603960 hashes/second: | 4 C2 2.4GHz cores |
maucaillou: | 47603960 hashes/second: | 4 C2 2.4GHz cores |
ausone: | 47603960 hashes/second: | 4 C2 2.4GHz cores |
lafiterothschild: | 47603960 hashes/second: | 4 C2 2.4GHz cores |
yquem: | 47603960 hashes/second: | 4 C2 2.4GHz cores |
margaux: | 47603960 hashes/second: | 4 C2 2.4GHz cores |
dauzac: | 47603960 hashes/second: | 4 C2 2.4GHz cores |
utrecht: | 47603960 hashes/second: | 4 C2 2.4GHz cores |
montgomery: | 45636363 hashes/second: | 4 K10+ 3GHz cores |
ranger: | 29530201 hashes/second: | 4 K10 2.2GHz cores |
miranda: | 27178268 hashes/second: | 3 (out of 4) Ci7 2.66GHz cores running 2 threads each |
ebauche: | 19200000 hashes/second: | 2 C2 2.4GHz cores |
All but two machines run under Linux, the exceptions are two Macintoshes. Programs were written in C++ and CUDA.
With all the machines running we got up to a total of 3079431974
hashes/second, i.e. 332578653192000 hashes/30 hours (but not all
machines were up all the time).
Here are some statistics on the difficulty of finding hashes at
various Hamming distances.
Average number of hashes: | to find hash at distance: | Note: |
---|---|---|
51180834166 | 39 | |
160104147904 | 38 | |
518231847164 | 37 | |
1736777001308 | 36 | |
6030475698988 | 35 | |
21709712516358 | 34 | |
81092161458160 | 33 | (fairly good chance) |
314539292928624 | 32 | (some chance) |
1267986524618517 | 31 | (would have to be lucky) |
5317362845174426 | 30 |
Well, guess we got overly lucky with the distance 30 string. Tesla made the right internal guesses!
Some results of low weight:
34 | giscours | bUGs buGS buGS bUgS blAnK BlAnk blanK BlanK BUGS BUgS bugS BugS dwCpy |
34 | margaux | BugS BUgs buGs BUgs bLANK Blank bLanK BlAnk BuGs BuGs BUGS bUGs FcBkw |
34 | angel1 | cOWs cOws cOws Cows dirTY dIrTY DIRty DirtY cOWS COWS CoWS COWS qqhmD |
34 | behemoth | cOWs CoWs cowS COws diRTy diRTy diRty dIrtY cOWS COwS COwS CowS kicOo |
34 | behemoth | coWs cOWs coWS cOws dIRtY Dirty DiRTY DIrtY cOWS Cows CoWs Cows FtEgn |
34 | behemoth | CoWs cOws cOWs COWS DirTY DirTY DirtY dIrty COWS CowS cOwS CoWS csr6B |
34 | behemoth | coWS cOWs cowS cOws dIRtY Dirty DiRTY DIrtY cOWS Cows CoWs Cows byhia |
34 | berlekamp | BUGs BuGs bUGs BUGS BlAnk BlAnk BLAnK blAnK buGs bUGs bUGs bUGS icxxt |
34 | ebauche | rUBY rubY ruBY ruby tokyo ToKYo tOkYo tokYO rUBy RubY ruBy ruby ifloc |
34 | lanczos | BUgS bugS BUGs BUGs blAnK Blank BlAnk BlANK BuGs BUGs bUGs BuGs ckbBr |
34 | ranger | BuGs BUGS bugs bUgs BlAnk BlanK bLaNk blaNk BUGs BUGS BUGs BuGs banzb |
34 | unicorn | bugS BUGS BUGs bUgs blANk BlanK blAnk bLAnK buGS bugs BuGS bUgs xphKL |
34 | unicorn | bUgS bUGs bUGs BuGS bLank bLANK BLAnK BLaNK BuGS BuGs BuGs bUgs tmFgx |
33 | behemoth | COws cows COws COWs diRTy diRTy diRty dIrtY cOWS COwS COwS CowS omBdF |
33 | behemoth | CoWS coWS CowS cOWS DiRTY DIrtY DirTy DIRtY COWS COWS cOwS cOWS bvpDq |
33 | movebank | BUgs bUGS bUGs bugS BlaNK BLanK blANK bLaNK BugS BUgs BuGs bUgs liwcC |
33 | ranger | BuGs buGS BUgs BUgs BLanK BlanK bLanK blAnk BUGs bUgS bugS buGs CltAB |
32 | giscours | bUgs BuGS bUgS BUgS blAnK BlAnk blanK BlanK BUGS BUgS bugS BugS wrubc |
32 | behemoth | cOWs cOwS cOwS cowS DIRtY Dirty DiRTY DIrtY cOWS Cows CoWs Cows owjo0 |
31 | centaur | bUGs BUgs bUGS bUgS bLAnk BlAnK BlANK BLank buGS bugS bUGS bUGs dCao2 |
30 | tesla | BuGS bugs bUgs BUGs BLAnK BlAnK blANK BlaNK BuGS BUgS buGS bugS FkCjV |
Nothing is official yet, but from scanning the public postings it looks good for us. A very nice leader chart is by jazzychad who went through the effort of removing invalid or repeated entries and got us reloading continuously. Thanks for the great service of putting everything in humanly accessible and readable form! Tanja and Dan have been sending tweets to engineyard. Our best one is at Hamming distance 30 from the challenge text
BuGS bugs bUgs BUGs BLAnK BlAnK blANK BlaNK BuGS BUgS buGS bugS FkCjVand we got some other nice small distances, too.