## Cryptology-I: § 2.3: Vigenere-Based Systems.

#### Instructors: R.E. Newman-Wolfe and M.S. Schmalz

The production of ciphertext by the one-time pad and other such manual devices, while intuitively attractive and efficient in the field, does not lend itself to mechanization with the technologies that were available shortly after World War I. For example, consider what one could construct with components such as relays, electromagnets, primitive switching equipment, and sophisticated gearing and mechanical transmission devices (similar to miniature automobile transmissions). To mechanize the production of ciphertext, a family of devices called rotor machines was invented, which implement Vigenere ciphers with long periods. Two of the best-known instances of rotor machines are the Hagelin Machine, a commercial device, and the series of rotor machines generically called the Enigma Machine, which were employed by the German military in World War II. It is interesting to note that a similar predecessor of Enigma was invented in Germany by Arthur Scherbius and Arvid Damm in the 1920s, then later patented in the United States in 1928 [-].

The cracking of the Enigma code is has been said to be the most important historical contribution of cryptanalysis [-]. It is well known that the efforts of the Bletchley Park cryptanalysis team (also called "crippies", who were led in part by Alan Turing), directly resulted in the saving of at least tens of thousands of lives and the shortening of World War II by perhaps several years. An excellent review of this period in cryptology is given in Reference [-], with supplemental material in References [-], [-], and [-].

### 2.3.1. General Concepts of Rotor Machines.

• Definition. A generalized rotor machine (GRM, as conceived by Scherbius) is an electromechanical device that has a keyboard, a series of rotors, and a set of lamps that are used as display devices. Given an alphabet F, each rotor is a wheel having |F| possible positions that implements a bijection Tj : P × F -> F, where P denotes a set of rotor positions and j = 1..n . Let a denote a plaintext message having m symbols. If pi : P -> P denotes a position function of the i-th rotor, then the rotor machine's output is given by

TR(a(i)) = Tn(pn-1,Tn-1(pn-2,Tn-2(... T2(p1,T1(p0,a(i))) ... ))) , idomain(a) , (I)

where a(i) denotes a symbol input on the keyboard, p0 denotes the initial position of one or more rotors, and TR(a(i)) is displayed by the lamp that illuminates a given character on the displayed alphabet F, as shown in Figure 1.

Figure 1. Schematic rotor machine with n = 3 rotors and alphabet F = {A,B,C,D}.

Here, the keyboard emits an "A", which is transformed by the rotors into a "C" (Rotor #1), then a "B" (Rotor #2), then a "C" (Rotor #3). Because the output of the rotors is a "C", the corresponding lamp is activated to display the letter "C". At the next character of the message, Rotor #3 will move ahead one character, which may (or may not) trigger the movement of Rotor #2.

Remark. In the WWII era, rotors were typically constructed of Bakelite or similar dielectric material, with wires embedded inside the dielectric. A wire implemented a given map between two symbols, where the rotor map was a bijection.

Observation. If the rotors of a GRM rotate at the same speed and maintain constant angular offset between adjacent rotors, then the GRM implements a Caesar cipher. This can be proven by noting that

(a) each rotor implements a Caesar cipher, which is a bijection;
(b) the output of a rotor machine is the composition of the individual rotor shifts; and
(c) the composition of n bijections is itself a bijection.

Remark. In order for the rotor machine to implement a long-period Vigenere cipher, the rotors must have different rotational speeds, such that an n-rotor machine has a maximal period |F|n. That is, the position of the i-th rotor depends on the (i+1)-th rotor's position. This is the customary serial (odometer-like) gearing of most Enigma machines. Since there are no fixed rules for the way a rotor machine must be geared, many variations are possible, as discussed in the following section.

### 2.3.2. The ENIGMA Machine.

The German Enigma apparently began as a more-or-less standard rotor machine [-] with three rotors. However, requirements of increased security brought on by early phases of the war in Europe (1939-1942) dictated an increased number of rotors. In order to increase the effective number of rotors without drastically increasing weight and power consumption (important considerations for field operations), the developers of Enigma added a reflector, which routed the rotor machine's output back through the rotors, but by a different path than that shown in Figure 1. When the rotor gearing was chosen properly, this effected a doubling of the number of rotors and a squaring of the size of the search space associated with cryptanalysis. That is, instead of a maximal period |F|n, it was possible in certain circumstances to achieve a maximal effective period of |F|2n+1.

A further addition was the Steckerboard, a manual plugboard not unlike a small telephone switchboard of the time. The Steckerboard first implemented a substitution, which Enigma's developers thought would render Enigma secure. Near the end of the war, there was an attempt to implement a transpostion using the Steckerboard, which was a difficult goal due to the requirement of buffer memory (then available using only relays or mercury delay lines). The Enigma machine developers thought this would render the machine resistant to all cryptanalytic attacks. In the more usual Enigma machine configuration, with the reflector in place, not only were the number of rotors effectively doubled, but the Steckerboard transposition was inverted at the end of the encryption sequence. An Enigma-like rotor machine is shown in Figure 2.

• Algorithm. Using the notation developed in the preceding section (i.e., T for the rotor transform and p for the rotor position with px for the reflector position), and adding the reflector substitution R : F -> F, we can express Enigma's encryption function as

eE(a(i)) = V1(p2,V2(p3,V3(... Vn(px,R(pn,Tn(pn-1,Tn-1(pn-2,Tn-2(... T2(p1,T1(p0,a(i)) ... ) ,

where idomain(a) and Vj inverts Tj for a given rotor position, with j = 1..m. Note that the reflector must not map any symbol to itself (e.g., "A" |-> "A"), since that would cause retracing of the encryption path, which would result in no change to the symbol that was encrypted along the forward path.

Remark. Adding the Steckerboard substitution causes eE to be perturbed as

TE(a(i)) = S-1(eE(S(a))) ,

where X = domain(a) and S : F -> F denotes the Steckerboard substitution. If the Steckerboard was to implement a transposition of form S : X -> X, then the preceding equation would become

TE(a(i)) = eE(a(S(i)))[S-1(i)] ,

and encryption/decryption would be applied to blocks of |X| or fewer symbols.

Figure 2. Schematic Enigma machine with n = 3 rotors and alphabet F = {A,B,C,D},
where dotted lines denote the return path following reflection, and the Steckerboard implements a substitution that includes "B" |-> "A" and "C" |-> "D".

Observation. Enigma's keyspace was parameterized by

(a) Initial position of the rotors and reflector,
(b) Reflector substitution R, and
(c) Steckerboard permutation S .

However, the Steckerboard that was implemented as a substitution had minimal effect, since the inverse Steckerboard was applied to the rotors' output. Thus, it was the period of the cipher as generated by the rotors that caused the cryptanalytic search space complexity to be high.

Remark. If the rotors do not move in relation to each other as the wheels of an odometer (e.g., rotor n moves once per input character, rotor n-1 moves once per rotation of rotor n-1, etc.), then the effective period of the Vigenere cipher implemented in the rotor machine can be less than the theoretical maximum. In such cases, the pattern of application of the Caesar cipher implemented in a given rotor may be less regular. This is particularly true when the gear ratios between adjacent rotors are comprised of prime numbers. Such facts are important in cryptanalysis, as follows.

### 2.3.3. Cryptanalysis of Rotor Machines.

The preceding discussion could lead one to surmise that cryptanalysis of the GRM or Enigma machine may not be as difficult as the mechanical complexity of the machine may indicate. In order to understand the associated techniques, let us recall some concepts from group theory.

• Definition. Let A be a set with n members, and let S = { | : A -> A } denote a set of permutations on A.

Lemma. G = (S,o) is a group, where o denotes functional composition.

Proof. We prove the group properties of G, as follows:

1. Closure. A permutation is a bijection. The composition of bijections is a bijection.
2. Identity. Let the identity permutation (a) = a, where a denotes an element of A. Given 1S, o 1 = 1 o = 1, which is easily verified by inspection.
3. Associativity. This property follows from the fact that composition is associative.
4. Inverse. Each permutation 1 S has an inverse (1)-1, S such that 1 o (1)-1 = , the identity permutation.
Therefore, G is a group.

Question. Is G an Abelian group? Answer: No, because composition is not commutative.

Remark. The fact that G is not an Abelian group is important in practice, since this means that different rotors cannot be interchanged while preserving a given encryption transform. Additionally, the rotor initial position and current position become nontrivial considerations.

• Observation. Feasible techniques for cryptanalysis of rotor machines include (a) brute-force methods, (b) the Kasiski attack, and (c) maximum-likelihood estimation (MLE) of the rotor configuration.

• Brute-force attacks utilize multiple rotor machines that are connected to effect parallel decryption of ciphertext using different rotor settings per machine. The Polish cryptanalysts who successfully attacked the early Enigma machines used this method in a configuration called a Bombe, because the clicking of the rotors sounded like a ticking time bomb. The output of each candidate decryption is scanned for well-known words or for groups of symbols that are expected to be in the plaintext (as determined from traffic analysis, semantic analysis, or n-gram based statistical analysis). For example, one of Hitler's officers would start his daily code transmission with a standard political greeting. Additionally, by comparing results from each day's candidate decryptions, key changes and rotor initial positions could be predicted a priori.

• The Kasiski attack is useful for determining the period of rotor configurations whose method of interaction (e.g., gearing) is unknown. This technique is not required for odometer-like (hierarchically-geared) rotor drives, since the period of such machines is |F|n. However, the Kasiski attack is not useful when the cipher period is long in relation to the plaintext (input) size. In such cases, the n-grams used as markers for the Kasiski test do not repeat sufficiently to furnish useful information.

• Semi-automatic cryptanalysis via MLE techniques is based on the following three steps:

1) Depict a rotor transform as an adjacency matrix of equiprobable outcomes. For example, if |F| = 4, then each rotor (including the reflector) would be represented by a 4x4 matrix, where each element would be valued at 0.25 = 1/|F|.

2) Assume that the plaintext to be determined from given ciphertext c has statistics Pr(b) similar to statistics Pr(a) of a given plaintext corpus. One can then perturb the adjacency matrix representations of the rotors such that Pr(b), where b is the rotor machine output (in decryption mode), approximates Pr(a), and b contains recognizable words or phrases.

3) The output of the preceding step is augmented manually to fill in missing letters in recognizable words, until the complete message emerges. In practical applications, one may only need to complete a portion of the message to obtain the required information.

The goal of this process is to produce rotor machine adjacency matrices that describe the transform which the rotor machine implements. The following theory is illustrative.

Assumption. Let the structure of a rotor over an alphabet F = {A,B,C,D,E} be as shown in Figure 3, below. The rotor transform Tr: F -> F can be expressed in terms of the graph

G(Tr) = {(A,C),(B,A),(C,E),(D,D),(E,B)}.

Figure 3. Schematic diagram of a rotor.

• Definition. Given an alphabet F indexed by the function h : F -> , the rotor transform Tr: F -> F has an adjacency matrix representation M denoted by

MG(Tr) = {((i,j),M(i,j)) : M(i,j) = 1 if (h-1(i),h-1(j)) p2(G)
and zero otherwise, where i,j }.

Example. The adjacency matrix MG of the rotor transform illustrated schematically in Figure 3 is shown in tabular form in Figure 4.

Figure 4. Adjacency matrix of the rotor in Figure 3.

Observation. If MG(Tr) is converted to a real-valued matrix, then we have a basis for an optimization of MG to yield a Boolean matrix similar to that shown in Figure 4. We begin by starting with the assumption of equiprobable outcomes, then perturb the associated numerical representation by small random values to seed the optimization process. In the preceding example, the matrix MG would have weights of value 1/|F| = 1/5 = 0.2, perturbed by a small random value. For example, if single precision arithmetic is employed, then the random value would be in the range [10-4,10-6].

Algorithm. Given the preceding theory and observation, we are now able to address the problem of semi-automatically determining the rotor machine's adjacency matrices, and thereby guessing the rotor configuration. The following steps pertain.

Step 1. Construct a plaintext corpus (50k to 100k characters of text). For example, you could save this document to an ASCII file, then read in the file, convert it to uppercase, and discard all characters not in the alphabet {A-Z}. This would be a useful technique for alphabet A only. Other methods would be required to filter the characters in alphabets B-E. Choose a subset of the plaintext corpus and encrypt it to form the "unknown" ciphertext.

Step 2. Compute the n-gram (symbol, digram, trigram, etc.) probability distributions from histograms that you compute given the plaintext corpus you constructed in Step 1. From the probability distributions, you can compute statistical measures associated with each histogram (e.g., mean, mode, median, and standard deviation).

Step 3. Construct each rotor's transform in order to specify the rotor machine, then formulate the adjacency matrix for each rotor transform as shown above. Thus, if you have n=3 rotors, there will be three adjacency matrices. Initialize the matrices to the values discussed in the preceding observation. Additionally, you will want to assume odometer gearing only, i.e., each rotor advances one symbol or position for each complete revolution of the next-less-significant rotor. And, you will need to specify the rotor initial position as a known quantity. Otherwise, you will have to guess the correct position given |F|n possible alternatives. For purposes of efficiency, start with a simple rotor machine (i.e., |F| < 5 and n < 3), with no Steckerboard or reflector.

Step 4. In order to constrain the optimization process, you need a merit function, also called an objective function in optimization theory. This function (which we denote as f) tells you how close you are to satisfying the constraint that directs your MLE-based optimization. For example, in this case, the objective function could compute the norm of the difference between the probability distributions of the candidate (decrypted) plaintext Pr(b) and the known (corpus) plaintext Pr(a) as

f(a,b) = || Pr(a) - Pr(b) || = [(Pr(a) - Pr(b))2]1/2 .

Clearly, one would want to minimize this difference. Note that the objective function can also be formulated from statistical parameters such as the mean, mode, standard deviation, etc., as well as from the histograms (or probability distributions) of various n-grams (e.g., digrams, trigrams, etc.)

Step 5. Apply the MLE approach discussed in class as follows:

a) Perturb one or more coefficients of the adjacency matrix. Then, threshold the adjacency matrix by setting to unity only those values larger than a given threshold value T. For example, in the initial perturbation step, you might increase one value in each row to 0.6, to form the adjacency matrix of a bijection. A threshold value of 0.5 would then suffice to produce a Boolean-valued adjacency matrix.

b) Configure your rotor machine according to the transforms described by the adjacency matrices you computed in Step 5a), and apply the rotor machine decryption to the ciphertext obtained in Step 1. This yields a trial decryption b.

c) Apply your objective function to b to obtain a difference score between statistical measures derived from b and the plaintext corpus.

The goal of the first few iterations of steps 5a) through 5c) is to obtain a large decrease in the output of your objective function. Since you want to approach a near-zero difference in the objective function output as quickly as possible, this implies fast convergence in the initial iterations of the MLE optimization process, as shown in Region A of Figure 5. In subsequent iterations, you will need to time average the output of the objective function f by averaging f's output over the last K iterations. Averaging is essential to remove the oscillations shown in Region B of Figure 5. Without averaging, you will not be able to reach a minimum in f's output.

Figure 5. Hypothetical objective function output that schematically illustrates
zones of convergence in a constraint-based optimization problem.

Step 6. After you pass through Region B (slow, oscillatory decrease in f's output), one usually encounters smaller oscillations in Region C, where the average output of f decreases very slowly and the rate of convergence approaches zero. (In order to compute the rate of convergence, take the first derivative with respect to time of the time average of f's output. You may even want to average these derivatives over a few samples, to remove unwanted noise.)

When the rate of convergence brings the average of f's output to within some limit of zero, then it is time to stop the MLE process. In practice, the choice of depends largely upon the quantization error inherent in the n-gram histograms that are employed in computing f. Although this is not usually a problem for large plaintext corpi and ciphertext samples, you may have to choose your ciphertext to be several thousand characters, in order to obtain an average quantization error that is within, say, two percent of full scale (which equals an error of 0.02 in a probability distribution). It would be helpful for you to recall the discussions we had in class about quantization error and analysis of error. Then, use that theory to predict the limiting error with which various symbols can be determined from histogram data.

Step 7. When you have recognizable words or phrases in b, this can also be a sign that the MLE process is coming to a convergence point. At this point, you may want to stop the MLE algorithm and guess the remainder of the text. From the guessed text and the known ciphertext, you can confirm the rotor machine's configuration. A glance at your solution to Homework Problem 1.1 (Vigenere cipher) may help you here.

Be aware that the MLE process usually does not produce a perfect decryption, due to quantization error, computational errors, and erroneous initial assumptions. However, with practice (starting with a one-rotor machine over a very small alphabet), you will be able to obtain reasonably efficient guesses at rotor machine configurations.