Introduction to Error Detection and Correction

Ensuring reliable communications depends on information being transmitted and received without error. Even a single error in a binary transmission can destroy the value of the message being sent. There are many ways with which error detection or correction may be accomplished - some are better than others.

Send the Message Twice

An obvious way to deal with the potential for errors in a message is to send the message twice. While this apppears useful, it is also quite expensive in terms of data throughput. Furthermore, even if the two copies differ upon reception, there is still no way to tell which is the correct version. This method accomplishes error detection, but no correction.

      transmitted message                          received message
   SEND $100 NOW SEND $100 NOW ------------> SEND $200 NOW SEND $100 NOW

Sending the message three times may allow a majority vote to be taken and thereby allow some degree of error correction. However, the cost of having the ability to correct errors here is quite high, as every message (even the good ones) are all sent three times. Likewise, it would be dissapointing to learn that a new 3 TB hard disk could only be used to hold 1 TB because all the data was to be stored in triplicate for error control.

The value of an error control scheme is measured in:

The ability to detect errors;
The ability to correct errors;
The cost in terms of additional bits on a message.

There are better methods than simply retransmitting the data two or more times to achieve control over errors. Indeed, the goal of error control coding research is to optimize these three factors to achieve the best error control at the least cost. In every case, an error control algorithm must be chosen as a function of the communications channel in use and the probability of having certain errors.

In many applications, it is sufficient to achieve reliable error detection, as the data can simply be repeated upon request when errors occur. This is know as an Automatic Repeat Request (ARQ) method.

In other cases, no retransmissions are possible (broadcasting information, or reading data from a damaged CD-ROM, for example) and the necessary information to accomplish error correction in the receiver must be included. This is called Forward Error Correction (FEC).

Ultimately, even the proposed send-the-data-three-times approach can fail if multiple bit errors overwhelm the error control scheme. In that case, two copies may have the same error and the wrong correction is made or perhaps all three copies are completely different and no correction is possible (although the error is certainly detected). In any case, if $100 is to come from a banking machine, the actual balance normally would be double-checked (manually) at the end of the day and further verification would resolve any outstanding errors; similarly, further checks higher in the OSI model could ensure that a digital communications system is extremely reliable under all conditions.

EVEN PARITY

The simplest of error detection methods is the well-known addition of a parity bit to a codeword. Typically, a 7-bit value is transmitted with one additional bit chosen to ensure that the total number of '1' bits in the word is an even number. This is called even parity.

In the shorthand of error control codes, this would be called an (8,7) code, since 8-bit codewords are used to send 7-bit data words. For example, if the 7-bit message "0101100" is transmitted using even parity, the 8-bit codeword "01011001" has an even number of ones. If a single bit is changed during transmission, the receiver can detect this error by checking the parity of the received codeword, as shown below.

      transmitted message                          received message
       0 1 0 1 1 0 0 1         ------------>       0 1 0 1 0 0 0 1

Because there is not an even number of 1's upon reception, an error has occured. There is no way to tell which bit caused the problem. This simple parity method allows error detection, but not correction.

Value of simple parity checking:

Detection of single-bit errors;
No correction possible;
Cost is one additional bit per message word.

Note that errors in two bits (or any even number of bits) would not be detected. The cost is one additional bit added to every message.

Parity is Exclusive-OR (XOR)

A parity generator can be constructed in hardware using nothing more than exclusive-OR (XOR) gates.

A	B	A XOR B	MOD 2 ADD
0	0	0	0 + 0 = 0
0	1	1	0 + 1 = 1
1	0	1	1 + 0 = 1
1	1	0	1 + 1 = 0

The XOR operation accomplishes modulus 2 addition, and an even parity bit (P) can be generated directly as the XOR of any number of message bits (d) as shown below. parity

The Distance Between Codewords

The parity method works because a single-bit error in any position in a codeword will change the parity from even to odd. While this observation may seem obvious, a closer study of parity reveals details which will be useful in the study of more powerful coding methods.

Consider all possible 8-bit codewords (there are 256) and all possible 7-bit messages (there are 128). When a given 7-bit message is sent using an 8-bit codeword, it is uniquely mapped onto one of the 256 available 8-bit codewords. Consequently, only 128 of the 8-bit words represent valid messages (in this case, those 128 words having even parity), and these are the only codewords which are ever transmitted. The other 128 8-bit codewords, if received, represent errored messages. The use of even parity is really only one way to map the 128 valid messages onto 256 available codewords. (Use of odd parity is another way.) The magic of parity, and why it works, is that every valid codeword is exactly one bit error away from an invalid codeword. No single bit error will change a codeword with even parity into another valid codeword with even parity.

The Hamming distance is defined as the number of bit changes required to change from one codeword to another. For example, the codewords P=10101011 and Q=11011011 are separated by a distance of 3 because it is necesary to change three bits to change P into Q.

In the case of parity, there are 128 valid codewords and 128 invalid codewords which are carefully arranged to ensure that there is at least a distance of 2 between any two valid codewords.

A: 00000011	B: 00000001	C: 00000101
D: 00000010	E: 00000000	F: 00000100
G: 00001010	H: 00001000	I: 00001100

Valid codewords {A,C,E,G,I} are separated
from each other by at least a distance of 2.

Imagine a chessboard with 256 squares. The valid squares are light, while invalid squares are dark. It takes two horizontal or vertical jumps to move from one light square to another. Any single jump in any direction from any light square must necessarily land on a dark square. Note that any other arrangement of the light and dark would result in two light squares being adjacent.

Now look at the binary value stored at each square. It takes two bit changes to move (horizontally or vertically) from one light codeword to another. Any single bit change in any direction from any light codeword must necessarily land on a dark codeword. Any single bit error in a valid codeword must lead to an invalid codeword. Any other arrangement of these codewords would result in two valid codes being adjacent.

If a set of codewords is defined as above such that the minimum distance between any two valid codewords is at least 2, then single bit error detection is possible. This is what simple parity protection accomplishes.

If this minimum distance can be increased, (with fewer valid codewords on a bigger chessboard) then error correction becomes possible. This will require adding additional bits to increase the number of available codewords and allow greater distance between valid codewords.

The challenge of creating codeword sets with predictable distance properties is illustrated by the Online Codeword Generator which uses brute force to make sets of codewords with a set length and minimum Hamming distance.