Authors: Tom Mahon,James J. Gillogly
Tags: #Ireland, #General, #Politics: General & Reference, #Terrorism, #Cryptography - Ireland - History, #Political violence, #Europe, #Cryptography, #Ireland - History - 1922, #Europe - Ireland, #Guerrilla warfare - Ireland - History - 20th century, #History - General History, #Irish Republican Army - History, #Internal security, #Political violence - Ireland - History - 20th century, #Diaries; letters & journals, #History, #Ireland - History; Military, #20th century, #Ireland - History - 1922-, #History: World, #Northern Ireland, #Guerrilla warfare, #Revolutionary groups & movements
Having broken the fifty-eight-letter message from the first set that I described above, I wanted to find out what actual key had been used to encrypt it.
20
The recovered âequivalent key' isLIAHKFDJBCGE
, and the order of the letters determines which column of the message array must be read first. That is, column three (the âA') is the first to be copied out, then column nine (âB') and so on. I assumed this column order was determined by a keyword or keyphrase. I assumed also that the keys were in English, since all the messages are in English. The third letter in the key must be the lowest letter in the alphabet that this key uses, and if that letter is used more than once, it would appear again as the âB' in ninth place inLIAHKFDJBCGE
. The âC' after the âB' will be a letter at least as far along in the alphabet as that represented by the âB'. Finally, the L must represent the highest letter in the alphabet used in this key, and since there is no higher letter to its right, it must be the only occurrence of that letter.
Using these restrictions on the keyword, we can write the alphabet repeatedly on a series of twelve vertical strips of paper and slide them accordingly, keeping these restrictions in mind â that is, column two must start no higher than column one, and so on â until a word begins to appear across several lines of the strips. During the period when these ciphers were used this was the standard way to recover the key. Now, however, we have more efficient methods: we can programme a computer to check each word or phrase in a list in turn to see whether it matches this pattern. For a key this long, very few words and phrases will match the restrictions forced by the pattern. I used a wordlist from an unabridged dictionary of 308,081 English words, and of these only one matched the pattern for this key:TRANSFERABLE
.
This procedure allowed me to find many of the one-word keys used by the IRA, but many others did not appear in my unabridged dictionary list. I postulated that short phrases were being used. It is much more difficult to find adequate lists of phrases on the internet, so I produced my own, making lists of phrases of a specified length from digital books. This time-honoured process has been used to good effect by generations of cryptographers, who would painstakingly count hundreds of thousands of letters to get good statistical distributions and find common phrases. I downloaded books from Project Gutenberg, a public service effort that distributes digital copies of books in the public domain.
21
Their first set of twenty complete books was made available to the public in 1990 and 1991, and the three million words in them would make a fairly good start on any statistical project. However, my feeling is that if a thing is worth doing, it is worth overdoing. I downloaded Gutenberg's production from 1990 through 2006 to run my statistics: 10,607 books in all, comprising over 89 million lines, 730 million words, and 4.4 billion letters.
When I ran my key-finding programme on twelve-letter words and phrases from this collection, again the only matching key it found wasTRANSFERABLE
. To identify the source of more keys I wrote to Bill Mason, another member of the American Cryptogram Association and one of our top cryptographic programmers. He reminded me that Google has made available for purchase a huge list of words and phrases gleaned from the World Wide Web. I bought this collection and wrote programmes to extract more potential keys from all this data. The collection is derived from over one trillion phrases, and is distributed on data DVDs as 24 gigabytes of compressed data. This made me very glad that I use a fast computer!
22
With these tools in hand I was able to find likely English keys for nearly all of the recovered keys. As we read more of the archived messages and broken ciphers we found clues to the way keys were chosen. In a message on 5 May 1926 the IRA's Department of Intelligence sent the unencrypted message shown in Figure 8, a list of key phrases to be used from 6 May through 14 May.
23
Just to make the keys crystal clear to anyone who might obtain this page, a line was drawn through this list indicating the first twelve letters of each phrase: âIsms go in wave', âSpeak of the co', and so on. These phrases
may be from newspapers or magazines of the period. I was unable to find any of the sources in open literature. However, they were indeed used for some of the messages we decrypted: P69/48(50), sent on 6 May, usedâISMSGOINWAVE'
and P69/48(23), sent on 14 May, usedâSTALESTTRICK'
. This represents a serious blunder in communication security, and cryptanalysts always welcome entries of this sort. It also broadened my search for keys from complete words and phrases to checking for keys starting at a word boundary but going for as many letters as needed, without paying attention to whether the key ends on a word boundary.
Figure 8. Sending keys in clear text, 5 May 1926.
One partially encrypted document included lists of keys used to communicate with each battalion and brigade, and with individuals.
24
Even without decryption this kind of information can be very valuable to an opponent: it allows the analyst to see the extent and command structure of the
army. Another partially encrypted message gave keys that had changed.
25
Again, intercepting this message would be a great boon to the cryptanalyst, who might have lost contact with the keys but can continue reading the message traffic if this one is encrypted in a known key. To emphasise the importance of this message, the sender said in clear English âThe following Key-words are now in use' before giving the keywords in encrypted form.
As I broke more messages and found more keys using the Project Gutenberg book list, I found several keys that clearly came from Nathaniel Hawthorne's
The Scarlet Letter
, including one dated 2 March 1927 with key âSurveyor Pue e'.
26
The header of this message includes the notation â(Cipher â New formula)'. Looking further, we found and broke a message dated 24 February 1927 saying:
27
Did courier at Xmas give you copy of Woolworth edition of novel The Scarlet Letter which was to be used for keys for cipher?
Using this clue I tried each possible starting point for keys in the Project Gutenberg online edition of
The Scarlet Letter
and found a number of keys that appeared in this book â many of them common phrases such as âhowever had be' and âon this side of', but some as distinctive as âthe scarlet le' and âa writhing hor'.
However, a message dated 14 December 1926 detailed the method completely:
28
Herewith method for using a different keyword for each.
Dispatch bearer will give you book to be used for this purpose.
Take the date of dispatch you are about to send.
Multiply the month by ten and add the date.
This gives you a number.
Take the page in the book corresponding with this number.
The first twelve letters in the fourth line on this page will be your keyword for that date.
For example take the date of this dispatch.
The number found is one hundred and thirty four.
The first twelve letters in the fourth line on page one hundred and thirty four are lampandsomet.
This would be the key word for this dispatch.
Verify this with book.
Name of book is The Scarlet Letter by Hawthorne.
As the sender suggested, I did indeed verify this with the book. I was unable to find the correct edition, but by comparing the position of the phrase in the digitised book with the page number derived from the above method
using the date of the message I found that the key locations did indeed line up very well. We attempted to find an edition of
The Scarlet Letter
that appeared with these keys in exactly the right place, but without success â for example, none of the dozen or so editions in the University of California at Los Angeles (UCLA) library came close to matching the data. If we could find the correct edition, then the keys for these messages could be found the same way the original recipient would have found them: by finding the page number from the date of the message and going directly to the fourth line of the text to read off the key.
I wrote to Jude Patterson, a fellow cryptanalyst who is good at finding âhats' (the original English of the equivalent key) for transposition keys. Jude had spent many years as a typesetter, and she had an interesting thought: that by testing different fonts we might be able to reconstruct the Woolworth edition closely enough to find precisely where the keys should fit. I sent seventeen recovered keys with their associated page numbers, as well as photocopies of some pages from cheap editions of various books from Britain and America from that period to give an idea of contemporary standards, fonts and conventions, and she went to work. Six months later, near the end of 2007 the breakthrough appeared in my inbox: Jude found that using eleven-point type with Garamond font produced results that almost exactly matched our data points for the Woolworth edition, suggesting that this may indeed have been the type face and size used in the original. She had needed to reconstruct by trial and error esoteric typographical conventions such as standards for dealing with widows (the last line of a paragraph at the top of a page) and orphans (the first line of a paragraph at the bottom of a page).
Jude Patterson wrote:
29
11 pt Garamond by 19¼ picas is the only trial where most keys fell spot on or with minimal adjustments in hyphenation.
Having found this so-called ideal setting, I proceeded to trials for page depth, and having established page depth, I took the whole slice of
The Scarlet Letter
from pages 29 to 134 and ran the final trial. It was amazing how little twiddling was needed to get the pages to fall beautifully. I put myself in the shoes of the typesetter, trying to keep all pages the same length, allowing orphans but disallowing widows, inserting hyphens to reduce big spaces between words, âfeathering' the type with extra letter-spacing where needed to gain a line to avoid a widow.
We now had a best-guess equivalent of the Woolworth edition used to produce many of the keys for this period! For example, here is the beginning of page fifty-six with these settings:
external matters are of little value and import, unless
they bear relation to something within his mind. Very
soon, however, his look became keen and penetrative.
A writhing hor
ror twisted itself across his features,
The key on the fourth line was used to encrypt a message dated 6 May.
30
May is the fifth month, so we multiply it by ten and add the date: 50+6= 56, the correct page number.
Decrypting the substitution ciphers
Nearly all the ciphers we encountered in these sets proved to use columnar transposition, either with or without the columns of dud letters. However, a substantial number of messages between GHQ and the outlying Irish battalions used a different system: mostly short fragments of cipher to encrypt the most sensitive parts of a message that was otherwise sent in clear English. We see a typical example of this in Figure 9:
31
Have you yet gotX&OYC&UIJO&MN?
Did you look up that manFX&WA HKGKH
/ whom I spoke to you about. I am most anxious that this case be followed up. I would suggest that if necessary you put your Staff Officer entirely on it until it is carried through.
Figure 9. Short Vigenère-style substitution ciphers, 4 May 1926.
Â
These ciphers are strikingly different from the columnar transposition ciphers that form the bulk of the encrypted traffic. They're very short, they are not broken up in five-letter groups, they include the symbol â&',
and they are mixed freely with plain English text. Most importantly to a cryptanalyst, their statistics are quite different from normal English: a count of the individual letters shows no obvious correlation between the high and low frequency letters in the messages and the high and low frequency letters in English. This indicates that the cipher used for these messages is a substitution cipher.