Notes.5 (Pfleeger Ch. 2) Basic Cryptography Why here? In computer security, crypto is but one of several ways of achieving isolation, and is not one of the more important ways. In network security, crypto is the main attraction - everything is done via message-passing (ultimately), so the only secure way to achieve confidentiality and the authentication needed for access control is through crypto. Cryptography Steganography Uses of steganography Watermarks Covert channels Crypto types Symmetric Asymmetric Block Stream Cipher Code All based on invertible functions (or else cannot decipher). Basics Substitution cipher Caeser cipher Rotational cipher Affine cipher Attack: size of keyspace Issue: confusion - effect of small change in plaintext on ciphertext should not be predictable Issue: diffusion - small change in plaintext should affect large part of ciphertext monoalphabetic cipher Attack: frequency-based attack Sorted frequency histogram of ciphertext should match well the the expected sorted frequency histogram of plaintext polyaphabetic cipher flattens frequency histogram Viginere tableau Attack: index of coincidence The IOC is a measure of variance in the frequency histogram, if r(x) is the relative frequency of x in the data, and the data are taken the 26 symbols from the Roman alphabet, then if encryption makes the distribution perfectly smooth, each character x would have f(x) = 1/26 ~ 0.0384 The nonuniformity of distribution for a single symbol x is just the difference between its observed relative frequency and 1/26, or d(x) = r(x) - 1/26. Since these have positive and negative values (for peaks and valleys, resp.) summing these over all symbols would amount to zero - not very useful. Squaring these makes them all positive, and the greater the variation, the larger the resulting number, so Var = Sum [x in A] of (r(x) - 1/26)^2 = Sum [x in A] of (r(x))^2 - 1/26 Taken as probabilities, (r(x))^2 is just the probability of the event x occuring twice, or the probability that two letters chosen from the text at random will both be the letter x. Since the text is finite, if f(x) is the frequency of x in the text of length n, the r(x) = f(x)/n, which is the probability of selecting x at random from the n characters in the text. If x is chosen at random as the first character, then in the remaining text, its relative frequency (and probability that x will again be chosen as the second character, since the first one selected is no longer available) is (f(x)-1)/(n-1), so the probability of picking two characters out of the text and having both of them be x is f(x)(f(x)-1)/n(n-1) The Index of Coincidence is just the sum of these probabilities over all symbols, IC = Sum [x in A] of f(x)(f(x)-1)/n(n-1). Another way of looking at it is that the variation measure above has a useless constant term, 1/26, that is always subtracted from the interesting part. IC just eliminates that constant term and specifically accounts for the difference that a finite amount of text makes in the computation. IC varies from .068 for 26-symbol English prose enciphered with a monoalphabetic substitution, to .038 for encipherment with a large number of alphabets. Its usefulness declines as the number of alphabets used for encipherment increases, as the curve becomes pretty flat after about 4 alphabets. However, it can be used to gauge whether a small number of alphabets were used, and to test a subtext to see if it may have been enciphered with a monoalphabetic substitution. Attack: guessing number of ciphers (in round robin) This approach can use the FH or the IC approach to test a series of hypotheses regarding the number of alphabets used (or the periodicity) to encrypt the plaintext. Attack: Kasiski attack - uses repeated polygrams (multiple symbol sequences) in the ciphertext to guide guessing of the period of a periodic encipherment. The assumption is that two repeated texts will be the result of two identical plaintext polygrams that were enciphered using the same sequence of keys, so distances between these pairs should be a multiple of the period of the cipher. One-time pad Vernam cipher Transposition ciphers Product cipher (concatenated cipher) Can increase period Can increase confusion and diffusion Shannon Characteristics 1. Secrecy required should determine level of effort 2. Keys and algorithm should be low in complexity - easy to generate good keys and apply them 3. Simple implementation 4. Errors should not propagate 5. Size of ciphertext should be no larger than plaintext Redundancy Absolute rate of a language is A = [lg(N)], where N = size of alphabet, [] = ceiling If the number of meaningful n-letter messages is 2^(Rn), then the rate of the language is R <= A, and the redundancy of the language is D = A-R, i.e., D is the number of extra bits per symbol used. Information theoretic security let h(C) be set of possible plaintexts for ciphertext C. an encryption is effectively secure if Prob(h(C) = P) < epsilon for some arbitrarily small epsilon Dual message entrapment - epsilon is never > 1/2 Ideally, knowing that the ciphertext is some particular C1 should not narrow down the possible plaintexts, Prob(h(C1)=P | C1) = Prob(h(C1) = P) = Prob(P) Unicity Distance Unicity distance is a measure of the amount of ciphertext needed to break a cipher. Let H(P|C) = Sum [all P] Prob(P|C)lg(1/Prob(P|C)) Unicity distance is the length n of the smallest message for which H(P|C) is close to 0. With 2^H(P) keys, a cryptosystem has unicity distance N = H(P)/D Probability of spurious decryption is p = (1-q), where q = 2^(n(R-A)) = 2^(-Dn)