Notes.7 - Hashes A hash H is a function that takes inputs from a large set, A, and maps them to fixed length elements in a finite set, B. A one-way function F is a function that is hard to invert. That is, if F: A -> B, then given some b in B, it is hard to find an a in A for which F(a) = b. By hard, we mean that no method much faster than trying elements of A using brute force is known to invert F effectively. This is collision-resistance. If H is a one-way hash, then it may be used in many ways: for authentication as a MIC (message integrity check) as a MAC (message authentication check) as a PRNG (for key stream generation) for password security. The Birthday Problem If there are 23 or more people in a room, the odds are better than .5 that two of them will have the same birthday. Assume that the 365 days of the year are all equally likely as birthdays, and that birthdays are random among people. With N people in the room, there are P=N(N-1)/2 distinct pairs of people. For each pair, there is a probability of p=1/365 that the two have the same birthday. The expected number of matches is just the number of pairs P times p, E(matches) = Pp. For the expected number of matches to exceed .5, P > .5/p = 365/2 = 183. Thus P = N(N-1)/2 > 365/2 => N(N-1) > 365, or N > sqrt(365) > 19. So, actually, with 20 people in a room it is around an even bet that two will have the same birthday. Who cares? Well, the Birthday Problem tells us that to find two messages with the same n-bit hash value, only 2^(n/2) candidates will have to be considered, on the average. If another message m' with the same n-bit hash value H(m) as some given message m is required, then on the average, 2^(n-1) candidates will have to be tested. Since hashes must protect against intentional misuse (a user producing two messages with the same hash that mean very different things, then either signing one and claiming that the other was actually sent later, or getting someone else to sign the one and then sending the other with the signature of the first), they must be twice as long as we would otherwise need for security. Why use a hash? Hash values are small and fast to compute. One-way hashes are collision-resistant, and may be used with public key systems for signatures much faster than signing the entire message or document using a slow cryptosystem. They also export a little better than pure cryptosystems do, yet can be used for encryption. Authentication Rather than using a challenge-response authentication method based on encrypting the challenge nonces, hashing may be used instead. A->B: r_A Alice sends Bob a nonce challenge B->A: H(K_AB|r_A) Bob replies with the hash of a common secret (K_AB) concatenated with the nonce (R_A) Alice can then verify the hash received, and no party that does not have access to K_AB is likely to be able to reproduce that value. Bob can authenticate Alice similarly. NOTA BENE: r_A must be used ONLY once by Alice, and there are a host of other subtleties that must be considered in authentication. MIC Just using H(m) by itself does not suffice, since H is known to all. Instead, H(K|m) may be used, so that only those who know K can either generate or check the integrity of m. Unfortunately, several popular MD algorithms use as an intermediate value when computing the hash the same quantity that is used as the hash output at the end. This means that, given a message m and its hash, H(m), some addition A may be made to m and a valid hash computed for it by initializing the MD algorithm to have H(m) as its carried information. This is not a problem with MD2, and is easily solved in any of several ways. Two ways are as follows. 1. Use H(m|K) instead. This way, the intermediate value needed for H(m|A|K) is not known (it is messed up by the inclusion of K in the computations after m), and even if it were, the secret K is needed to complete the computation of the hash. 2. Use only half of the computed H(m) as the MIC, so that there is only a very small chance that an appropriate MIC could be generated by an attacker. If only one other party knows the secret K, then the MIC serves as a MAC as well, authenticating that the message was sent by the only other party to have the secret. Encryption Since the output of the hash function is unpredictable, these may be uses as PRNGs in a manner similar to OFB or CFB modes of DES and used in a Vernam Cipher. Hashing Passwords Rather than storing passwords in plaintext (bad idea) or even encrypted (where they could be decrypted if the key were compromised), Unix and other systems store cryptographic hashes of passwords instead. The password is converted into the key for an encryption algorithm (a modified version of DES in the case of Unix), which is then used to encrypt a constant (0 in the case of Unix). Unix also uses a 12-bit salt, which modifies the DES data expansion algorithm and makes the same password with different salts have different hash values. This makes offline password guessing (cracking) more difficult. Using encryption as a hash function The message may be broken into key-sized blocks, and these blocks used as keys in the encryption fuction to encrypt repeatedly some constant (once for each key). If the output is too small for a usable hash, then use more than one constant, or reverse the key order, etc. MD2 Arbitrary byte length input is padded to a multiple of 16 bytes. (last p bytes, 1 <=p<=16, all have value p in binary) A 16-byte "checksum" is computed over the whole message and appended to its end. The padded, checksummed message is then processed in 16-byte chunks to produce a 16-byte digest.