hero image
Technical Columns

Basic principles of ECC (Error Correction Code)

Table of Contents

As semiconductor memory density increases, the risk of bit errors increases. To address this issue, ECC is attracting attention as a technology that dramatically improves the reliability of memory and communications equipment. This article provides an easy-to-understand explanation of the basic principles of ECC, with illustrations that are useful in development.

Why is ECC important now?

In modern systems that handle massive amounts of data, even a single bit error can cause serious problems such as system failure or data corruption. While information processing capabilities have improved with the increasing density and miniaturization of semiconductor memory, the risk of bit errors has also increased. ECC (Error Correcting Code) is an essential technology for protecting important data from these invisible errors. By automatically detecting and correcting errors, ECC contributes to improving system reliability in fields that require high reliability, such as AI and cloud services. ECC has now evolved from a "convenient" option to a "standard technology essential for design."

How ECC works

ECC features and benefits

To help you understand the benefits of implementing ECC, we have summarized five key characteristics. Understanding these will help you make design decisions about error tolerance and redundancy.

1. Automatic bit error detection and correction
ECC automatically detects and corrects bit errors that occur during data transmission and storage, significantly reducing system downtime and the risk of failure.

2. Parity bits provide both redundancy and efficiency
ECC improves error tolerance by adding parity bits, but it is essential to design it with consideration of trade-offs between memory capacity, communication bandwidth, and latency.

3. High-precision error detection and correction using the SECDED method
The most common ECC method is SECDED (Single Error Correction, Double Error Detection), which can correct one-bit errors and detect two-bit errors. If higher accuracy is required, methods such as BCH code and RS code are also available.

4. Stable operation through real-time correction
ECC performs instant correction during memory Access and communication, enabling stable operation even in mission-critical systems such as automotive control and financial systems.

5. Examples of ECC use in various fields
ECC contributes to reducing bit error rates and ensuring data integrity in a wide range of fields, including DYNAMIC RAMS, SSD, eMMC, UFS, communication infrastructure, and OPTICAL DISKS.

[Diagram] How 1-bit error correction works using Hamming (7,4) code

Here, we will explain how it works using the Hamming (7,4) code, a typical ECC method, as an example.
*Hamming (7,4) code is a typical ECC method that adds 3 bits of parity to 4 bits of data and can automatically correct 1-bit errors.

Error correction using Hamming (7,4) code

1. Creating a parity bit (send: write)

If the data bits are set to 1011 (4 bits), a 3-bit parity bit is added in the Hamming (7,4) code.
The bit sequence is as follows:

Position: Order of data and parity bits (1st to 7th)
D1, D2, D3, D4: Data bits (for 1011, D1=1, D2=0, D3=1, D4=1)
P1, P2, P3: Parity bits

How to calculate the parity bit:

P1 (1st): Parity bit related to the 1st, 3rd, 5th, and 7th bits
P2 (2nd): Parity bit related to the 2nd, 3rd, 6th, and 7th bits
P3 (4th): Parity bit related to the 4th, 5th, 6th, and 7th bits
S1, S2, S3: Bits used for parity check

Table Description:

Parity bits (P1, P2, P3) are set to detect and correct bit errors using parity check. For example, position 3 (011) is related to P1 and P2, but not to P3 (the yellow part of the table is 1).
Also, each parity bit is set so that the sum of the bits in the corresponding position is an even number.

Calculation of each parity bit:

If you want to set the parity bit to an even number, the calculation method is as follows:

  • P1: 1st, 3rd, 5th, 7th → P1,1,0,1 (P1 is undecided) → 1+0+1=2 (even number), so P1=0.
  • P2: 2nd, 3rd, 6th, 7th → P2,1,1,1 (P2 is undecided) → 1+1+1=3 (odd number), so to make it an even number, P2=1.
  • P3: 4th, 5th, 6th, 7th → P3,0,1,1 (P3 is undecided) → 0+1+1=2 (even number), so P3=0.

Therefore, the data sent will be 0110011.


2. Correction when 1 bit is inverted (reception: Read)

If D1 (third bit) of the transmitted data 0110011 in step 1 is inverted, the ECC function will detect a 1-bit error.

The receiver performs a parity check to identify the location of the error.

  • S1 (P1 check): 1st, 3rd, 5th, 7th bits → 0,0,0,1 → 0+0+0+1=1 (odd number), so S1=1.
  • S2 (P2 check): 2nd, 3rd, 6th, 7th bits → 1,0,1,1 → 1+0+1+1=3 (odd number), so S2=1.
  • S3 (P3 check): 4th, 5th, 6th, 7th bits → 0,0,1,1 → 0+0+1+1=2 (even number), so S3=0.

From this result, S3, S2, S1 = 0 1 1 (binary) = 3 (decimal), and we can see that the third one is a bit error.
From the above, we invert the third bit and return it to the original data (0110011).

If there is no bit error (no correction), no error detection is performed and the data is output as is.

Benefits of implementation and examples

By introducing ECC, bit errors that occur in memory and communications can be automatically detected and corrected, significantly improving system stability and data reliability. This reduces the risk of unexpected failures and data corruption, ensuring service continuity and reliability.

server:
Ensuring the safety of transaction data and customer information in the core systems of financial institutions and companies

Storage:
Maintaining reliability of large volumes of data in cloud services and data centers

Communication equipment:
Achieve stable communication and data transfer in network devices and base stations

Medical devices:
Contributes to the accurate recording and management of patient information and diagnostic data

Automotive field:
Improved reliability of driving data and safety functions in in-vehicle systems

AI/IoT devices:
Used to prevent errors in huge amounts of sensor data and inference results

Summary

ECC is an important technology that significantly improves system reliability and availability by automatically detecting and correcting bit errors that occur during data transfer and storage. ECC is becoming increasingly important in fields that handle massive amounts of data, such as semiconductor memory and communications, as well as AI, cloud computing, and automotive systems. ECC will continue to be a core technology that supports high reliability in order to achieve safe and secure data management. Please feel free to contact us if you have any questions.

Related Product Information