Cryptology - I: Project #3

Instructors: R.E. Newman-Wolfe and M.S. Schmalz

Undergrads and Grads:

Implement an RSA cryptosystem with (a) key generation, (b) encryption, and (c) decryption. The cryptosystem must be able to handle 512-bit numbers, which means that your chief task will be to implement and test a large-number arithmetic library.

Note: Several students have inquired about partitioning the input (plaintext). My best suggestion is to initially partition the input into 8-bit bytes and concatenate them. Some students were concerned about the plaintext binary representation having a decimal value that exceeds the modulus. If you think that's a problem, use only 63 bytes and pad the most significant eight bits with zeroes. This will guarantee that the plaintext does not exceed the modulus. Don't forget to remove the high-order eight bits after decryption, or you will have an unwanted 0₂₅₆ in the output.

After you have tested and verified the correctness of your algorithm, you should recode the input using only a five-bit encoding of {A,B,...,Z}. Thus, the input to your RSA algorithm will consist of 102 concatenated 5-bit character representations (510 bits), rather than 64 eight-bit representations. As before, you can use two zero-valued bits as padding. The advantage of using the five-bit encoding is that three bits of padding are eliminated per character representation. Such padding, which tends to occur in the same (or nearly the same) bit positions across the 26 English characters in the ASCII alphabet, can provide important information to an adversary who might examine your ciphertext.

Scoring: 6 base points for undergrads, 4 base points for grads.

Undergrads:

Choose options worth four points total from the list below, to bring your point total to 10.

Grads:

Choose options worth six points total from the list below, to bring your point total to 10.

General:

You shall submit electronically (to Dr. Newman-Wolfe only):

Documentation in form of one man page for each program, with format similar to Unix man pages (see /usr/man/man1/*). Include a section for theory of operation. Be sure to specify input and output.

Makefile (see make(1)) - this should be cumulative for the entire term (i.e., every makefile you submit should include commands to make everything you have submitted).

Complete source code, including header files. Code should have sufficient comments internally to aid understanding, and be free of manifest constants (use the #define preprocessor command as needed). Programs should handle erroneous input and provide help to the user.

Read the Programming Hints at the end of this document to help make your life a little easier.

List of Options

Now that you have thought about the basic RSA cryptosystem, you will want to use some of the following options (a psudo-random number generator or PRNG and a primality testing algorithm are required for RSA) to implement your cryptosystem.

Option 1. Simple PRNG:

Implement a PRNG that uses a well-known algorithm (which might not produce ideal bit statistics). A good example is a linear congruential PRNG. Characterize the bit distribution in terms of a histogram of decimal values. The decimal values are obtained by sliding an n-bit window along the bitstream your PRNG produces. Each decimal value corresponds to the decimal equivalent of the bit vector window at a given position in the output stream.

For example, an output stream 001001011... would have a 2-gram sequence of 00,01,10,00,01,10,01,11..., which would have the decimal equivalent of 0,1,2,0,1,2,1,3,.... The histogram for the preceding decimal sequence can be specified in terms of its graph as {(0,2),(1,3),(2,2),(3,1)}. The same output stream would have a 3-gram equivalent of 001,010,100,001,010,101,011,..., which would have a decimal equivalent representation of 1,2,4,1,2,5,3,... An n-gram sequence could thus be characterized in terms of a histogram that would have domain values in {0,1,...,2ⁿ}. If the histogram is uniform (i.e., flat) for
1 < n < 8, then your PRNG performs well.

In practical RSA applications, you would need to generate numbers as large as the modulus. The use of very large (up to 1024-bit) numbers implies that you must lump your PRNG test results together in histogram bins (a technique called binning), since no currently known computer can store 2⁵¹² or 2¹⁰²⁴ domain counts.

Scoring: 2 points.

Option 2. Blum-Blum-Shub (BBS) PRNG:

Implement a BBS PRNG and test it in the same manner as we discussed for the simple PRNG in Option 1, above.

Scoring: 2 points.

Option 3. Primality Test:

Implement a primality test (Miller-Rabin, Solovay-Strassen, or Lehmann) as described in class and in the text (Stinson). Your primality test must be able to handle up to 512-bit numbers, such that it is a useful part of your RSA cryptosystem.

Since the above-listed primality tests are statistical algorithms, you must include in the output of the test the probablity of error inherent in the result.

Hint: Recall from class presentation and discussion that if you use M tests to determine the primality of a candidate random number, and the probablity of error for each test is 1/n, then (assuming independent tests) the aggregate error probability is n^-M.

Scoring: 2 points per primality test implemented.

Option 4. Prime Number Prediction:

Implement software that generates two graphs of occurrence of prime numbers from 2 to 10²⁰ (a 67-bit number) and compare results with theory discussed in the text (Stinson) and the literature. Hand in both the paper graphs of your output data, as well as the software that generated the data, with appropriate documentation.

The first "graph of occurrence" should be a graph of the probability of a prime occurring within a given decade (order of magnitude, e.g., 1 to 9, 10 to 99, 100 to 999, etc.) as a function of the order of magnitude. For example, there are four primes (2,3,5,7) between 1 and 9, which is the zero-th decade. The first decade ranges from 10 to 99. For this graph, the histogram provides a useful plot format for your empirical (computer-generated) data, with an additional solid or dashed line representing the predicted (theoretical) result.

The second "graph of occurrence" should be a plot of the distance between successive primes. For example, the distance between 19 and 23 (which occur in the first decade) is computed as 4 = 23 - 19. It is best to express your results in terms of a mean distance with a vertical error bar (which looks like an "I") centered on the mean, with the bar termini at one standard deviation from the mean. The ordinate of this graph will represent distance between primes (you will probably want to use a log scale), and the abscissa will represent order of magnitude (as in the first graph, described above).

Hint: To reduce the effort of testing candidate primes, test only those primes whose decimal representations end in the digits 1, 3, 7, or 9. You can use an iterative algorithm to generate these, since the distance between candidate numbers follows the pattern (2,4,2). Also, use a fast primality test, such as Rabin-Miller. If more than one person is performing this implementation, you may want to divide the work equally among the various persons, once your software is written. This will greatly decrease the time required to test the prime numbers and generate results.

Hint: Since the probablity of finding large primes ending in the decimal digits 1, 3, 7, or 9 is approximately 1/(0.4 * 177) = 0.01412, or roughly one in 71 numbers, you may want to employ a strategy such as quadratic search to subdivide the search space more efficiently. Also, you may want to use the results from previous decades to estimate the probability of distances between primes in the current decade (i.e., the decade in which your software is currently working).

Scoring: 4 points.

References

References should be taken from the list provided at the end of the Project-4 description .

This concludes the description for Project #3.