CEN 6505 II. Physical Layer I - Nature of Signals and Transmission F. Encoding signals vi. Error-detection and error-correction 1. Preliminaries a. Hamming distance H(a,b) = # of bit positions in which a differs from b, i.e., the number of 1's in a XOR b. Hamming distance of a code is the minimum over all pairs of distinct code words of the Hamming distance between them, i.e., H(code) = min {H(a,b) | a<>b and a, b in code} b. Forward error correction - enough redundancy is transmitted in the code that errors can be corrected by the receiver without retransmission (used when retransmission is impractical, as in distant space probes, simplex lines, broadcast media where many stations may have to listen to unnecessary retransmission) Backward error correction - errors are detected and retransmission requested. (most common, generally more efficient, but has the flaw that a single error, repeated in every retrans- mission, will mean that the message is never delivered) c. d-bit error detection requires a code with Hamming distance d+1, d-bit error correction requires a code with Hamming distance 2d+1. d. Residual error rate - the probability that an error will go undetected Type I = no errors Type II = errors, none detected Type III = errors, one or more detected Pb = uniform probability that a bit is in error (random) P1 = probability that a frame is received error-free P2 = probability that a frame is received with one or more undetected error(s) (residual error rate) P3 = probability that a frame is received with one or more detected error(s) Assuming that errors are independent and have constant probability Pb, P1 = (1-Pb)^F, where F = bits per frame P2+P3 = 1-P1 2. Methods of Error Detection a. Parity One redundant bit per word, the bit's value is Sum i=0 to m-1 of a_i (mod 2) for even parity (Used for asynchronous transmission) or Sum i=0 to m-1 of a_i + 1 (mod 2) for odd parity (Used for synchronous transmission) Parity will only detect an odd number of bits in error, so (letting C(a,b) be the choose function), P2 = Sum(k=1 to W) C(W,k)(1-Pb)^[B(W-k)] [Sum(j=2,4,...to B) C(B,j)Pb^j(1-Pb)^(B-j)]^k = sum of probabilities that k>0 words are bad in a way that won't be detected, for each way of choosing those k words from W words per frame, and that the remaining words are not bad. The probability that the W-k remaining words are error-free is (1-Pb)^[B(W-k)], while the probability that the k words all have undetected errors is the probability that one word has an undetected error to the kth power. The probability that one word has an undetected error is the sum of the probabilities that it has a non-zero, even number, j, of errors. This is just the number of ways to choose j bad bits in B times the probability that they are all bad and the rest are all good. b. LRC/VRC Longitudinal Redundancy Check/Vertical Redundancy Check scheme uses not only one parity bit per word (or row of the frame, considered now as a matrix), but also a "parity check character", comprising the entire last row of the matrix, with each bit in the row checking the parity of the corresponding column. The row parity bits form the last column and are called the VRC, while the column parity bits form the last row and are called the LRC or the parity check character. LRC/VRC will fail to detect errors that occur in an "even rectangular form". More formally, if A is a subset of the rows and B is a subset of the columns, and the number of elements in each is even, then if exactly the bits in AxB are in error, the error will not be detected. Letting B be the number of bits per word (ie, the number of columns) and W be the number of words per frame (ie, the number of rows, including the LRC), then P2 = Sum(i=2,4,... to B) C(B,i) Sum(j=2,4,... to W) C(W,j) (Pb^ij)(1-Pb)^(BW-ij) = the sum over all different ways to choose a non-zero, even rectangle of the probability that exactly the bits of the rectangle are bad and the rest are ok. HD(LRC) = ??? (4) Can correct ?? errors (1) Efficiency? For detection, it is like a concatenated code with detection/erasure correction - the LRC parity indicates an erasure, the VRC indicates the word. 9-Track Tape Correction (parity/CRC) Similar scheme used for 9-track tapes, with records on each track protected with a CRC (detection), and the 9th track used for erasure correction (XOR all the undamaged bits to find the missing one). c. CRC - Cyclic Redundancy Check Given a m-bit message M, and a predetermined (r+1)-bit number P, generate an r-bit frame check sequence (FCS) that, appended to M, will produce an (m+r)-bit number T divisible by P (modulo 2). Note that addition and subtraction are done modulo 2, so that they are both eqivalent to bitwise XOR. T = 2^r M + FCS, where FCS is computed in this way: divide 2^r M by P, so that 2^r M = QP + R, and let FCS = R. Then T = 2^r M + FCS = QP + R + R = QP and is divisible by P. Note that the remainder has only r bits, and that it must be padded left with leading zeros if it has less than r bits. This can also be viewed as polynomials with binary coefficients, with operations again modulo 2, so T(X) = X^r M(X) + R(X). Note: P is often expressed in this way. When the received transmission T', is divided by P, it is assumed to be error-free if and only if it has no remainder. Errors that won't be caught by CRC: Let E be the (m+r)-bit error vector, with a 1 in each location in which there is an error in the received transmission, T'. T' = T + E. Then T' will pass the CRC test only if P|T', or T' = AP = T + E = QP + E, so AP - QP = (A-Q)P = E. Since P divides the left side, it must also divide the right side, E. Hence, only error vectors divisible by P will be undetected. If the MSB of P is 1, then there are exactly 2^m-1 of these, but how are they distributed? If first & last bits of P are 1, then all single-bit errors are caught. If P(X) has a factor with at least 3 terms, then all 2 bit errors will be caught. If (X+1) is a factor, then all odd number of errors are caught Burst errors shorter than r bits will be caught, as well as most large burst errors. 3. Methods of Error Correction a. Reverse Error Correction : Send the message, detect if there was an error, then request that the sender retransmit the message. This not only requires error-detection, but also duplex communication. It is not suitable for situations in which many stations would have to wait through a retransmission (such as broadcast media) or when effectively simplex lines are used (as in space probes). It also has the grave disadvantage that a consistent error (say a line stuck at one) will never be corrected: each time the received transmission will have the same error, so there is no way to infer which bit is bad (unless diagnostic procedures are used, which usually requires intervention from a higher layer). The same message is sent over and over until some other error causes the received transmission to pass the error detection test. The great advantage of REC is that it is simple and efficient. b. Forward Error Correction Here, the receiver corrects the error without requiring any further information from the sender. This requires a minimum amount of redundancy in the transmission. Not only must an error be detected, but its location must be determined. For one-bit error correction on an n-bit frame, we must be able to identify one out of the n bit positions if there is an error, or state that there is no error. Thus, we require ceiling of log(n+1) bits of redundancy in the frame. Since d-bit error correction requires a Hamming distance of 2d+1 in the code, or that each code word have a neighborhood of radius d around it such that no two neighborhoods overlap, we can place a limit on the number N of code words an n-bit frame may have for 1-bit EC: (n+1)N <= 2^n, since each code word must have a neighborhood of size n+1 and they can't overlap. N <= 2^n / (n+1) N can be an integer only if n+1 is a power of 2, so let n+1 = 2^k n = 2^k - 1. Only for frames with 2^k-1 bits can we hope for perfection, that the neighborhoods form a partition of the space. i. TMR One way to do forward error correction is to repeat each bit of the message several times, and let the receiver take a vote to determine what the correct value was for each bit. Trimodular Redundancy (TMR) is a method in use for reliable computation, and it can be used for error correction as well. It will perform one-bit EC by using three bits to encode each bit of the frame. Clearly it misses the lower bound of the previous section for frames of more than 3 bits. ii. LRC Revisited If there is only one error in a block, then the LRC method will have one parity bit bad in the LRC and the parity bit for one word will be bad. These will identify the location of the bad bit, which may then be corrected by inversion. This method has been used on 9-track tapes for correction of more than one bit of error. There, typically errors occur in only one track, so if the LRC (checking the nine track) has one bad parity bit, we assume that that track is bad, and use the parity track (the VRC) to locate the bad bits. Since an even number of errors in a track will be undetected by a parity bit, this approach was later strengthened by increasing the LRC from one parity bit per track to one CRC check per track for each block. This allows any number of errors in one track to be corrected under most circumstances. iii. Hamming Codes Let n=2^k-1 be the total number of bits sent in a word. We will use k check bits, each of which will check the parity of ceiling of 1/2 of the bits in the word. If we associate an n-bit vector with each check bit so that there is a 1 in the vector iff the check bit checks the parity of the corresponding bit in the word and a 0 otherwise, then the vectors must form a basis of the n-bit vector space in order for then to span the space (ie, check all possibilities). A convenient way in which to do this is to number the bits of the word 1,2,...,2^k-1, and let the bits whose positions are powers of 2 be the check bits. Check bit 2^i will be the parity bit for all bits in the word whose postion has the ith bit on. For example, bit 12 = 8 + 4 will be checked by parity bits in positions 8 and 4. (Note that each parity bit must check itself.) If there is an error in the transmission, then it will be detected and the error's position is just the sum of the positions of the bad check bits: the syndrome gives the location directly . EX: Message M = 10110110001 (11 bits) place into postions that are not a power of 2: bit 1 0 0 00 posn 5 8 4 21 contents 1011011_000_1__ ^ ^ ^^ - parity bit locations ________ - checked by bit 8 ____ ____ - checked by bit 4 __ __ __ __ - checked by bit 2 _ _ _ _ _ _ _ _ - checked by bit 1 Using even parity, compute value of parity bits (bit 8 = 1000 in binary, checks all bits in positions 8-15, which have bit 3 of their location equal to 1) bit 8 has value 1+0+1+1+0+1+1 % 2 = 1 bit 4 has value 1+0+1+1+0+0+0 % 2 = 1 bit 2 has value 1+0+0+1+0+0+1 % 2 = 1 bit 1 has value 1+1+0+1+0+0+1 % 2 = 0 So the final transmission is T = 101101110001110 ^ ^ ^^ - parity bit locations Now suppose that an error occurs in bit 6, so that the received transmission T', is v - bit in error T'= 101101110101110 The receiver computes the parity bits based on the received transmission T', bit 8 should have value 1+0+1+1+0+1+1 % 2 = 1 bit 4 should have value 1+0+1+1+0+1+0 % 2 = 0 X bit 2 should have value 1+0+0+1+1+0+1 % 2 = 0 X bit 1 should have value 1+1+0+1+0+0+1 % 2 = 0 From this, we see check bits 2 and 4 are wrong, while check bits 1 and 8 are right. Since the only bit checked by both bits 2 and 4 but not by bits 1 or 8 is bit 6 (=2+4, = 0010+0100 in in binary = 0110 binary = 6 decimal), the receiver concludes that bit 6 is in error, and corrects it back to a 0. The data is then extracted by deleting the check bits. Notice that perfect Hamming codes exactly match the lower bounds given above. They become more efficient as the word size increases: let R be the ratio of message bits m, to word bits n = 2^k-1. lim (n->inf) R = lim (n->inf) (2^k-1-k)/(2^k-1) = 1. However, the probability of more than one error also increases as word size increases, so there is some tradeoff between efficiency and efficacy. 4. Burst errors Most of the techniques described do not handle burst errors well. LRC/VRC does reasonably well, since a burst no longer than word length will be detected by the LRC, and CRC does very well with most bursts (as noted in that section). In order to improve the performance of parity or Hamming codes in the presence of burst errors, we can buffer a frame of words, each with its own redundant bits, and then send out the bits of the frame column by column rather than by rows. If a burst is shorter than the number of rows in the frame, then each row will have one or fewer errors, and they will all be detected (or corrected). Generally, a different approach is needed to correct burst errors. For this, BCH codes (especially Reed-Solomon codes) and Fire codes are especially uesful. Refer to Lin & Costello or Rao & Fujiwara for more details.