What is the "Hamming Code" that is also used in memory? Fundamentals and applications of error correction technology

With the evolution of AI and IoT technologies, the importance of "highly reliable memory" is rapidly increasing in fields that handle huge amounts of data. In the fields of AI development and embedded systems, memory error prevention is an important theme that determines product quality.

Among these, Hamming code is attracting attention as an easy-to-implement, highly efficient error correction technology that is used in many memory products such as DYNAMIC RAMS and SSDs. In particular, in fields such as autonomous driving, industrial equipment, and medical equipment, memory errors pose a fatal risk of system failure and reduced safety, making the introduction of highly reliable memory essential.

In this column, we will provide an easy-to-understand explanation of useful points in development, from the basic principles of Hamming coding to the latest application examples.

Overview of Hamming Code

Hamming code, invented by Dr. Richard Hamming in 1950, is a fundamental method in error correction technology. It is still used in many fields today and has made a significant contribution to the development of error correction technology. Hamming code is a technology that can automatically detect and correct single-bit errors by adding an additional parity bit (additional information for error detection) to the original data. This technology has greatly contributed to improving the reliability of AI systems and embedded devices, helping to improve the stability and safety of systems.
The next section will explain in detail how error correction works using Hamming codes.

This article focuses on the Hamming (7,4) code and the Extended Hamming Code (SECDED), which are often implemented and used by engineers in the field. There are other Hamming codes, such as the (15,11) code and the (31,26) code, depending on the data length and application, but the (7,4) and extended types are the mainstream in the embedded and memory fields.

What is Hamming (7,4) code?

Hamming (7,4) code is an error-correcting code that adds 3 parity bits to 4 bits of data, for a total of 7 bits, making it possible to detect and correct 1-bit errors. "7,4" means that of the 7 bits, 4 bits are data and 3 bits are parity bits for error detection and correction. It has a low computational load and is widely used in many systems, including embedded devices and memory products.
The Hamming (7,4) code is constructed as follows:

Hamming (7,4) code: Output codeword construction

Input data bits (D1 to D4): Refers to the original 4-bit data.
Parity bits (P1 to P3): Calculated using an XOR operation on specific combinations of data bits, and used for error detection and correction.
Output codeword (7 bits): 7 bits of transmission data generated by combining data bits and parity bits.

Error correction process

We will explain the error correction process for Hamming (7,4) code.

As an example, let's look at how data is automatically corrected when a single-bit error occurs in which bit D2 is inverted during data transmission of the original 7-bit codeword [0, 1, 1, 0, 0, 1, 1].

Hamming (7,4) code: Error correction process

1. The original codeword is a 7-bit codeword [0, 1, 1, 0, 0, 1, 1] with data bits [1, 0, 1, 1] and a parity bit added.

2. The fifth bit, D2, was accidentally inverted during data transmission, resulting in [0, 1, 1, 0, 1, 1, 1] being received.

3. At the receiving end, the syndrome (error pattern) [1, 0, 1] is calculated using the check matrix H to identify the location of the error bit.

4. The detected error bit D2 was inverted, and the data was automatically corrected to the original correct data.

Role of check matrix H

The check matrix H for a Hamming code is a "blueprint" that systematizes the parity bit design and verification rules, and is the basis for error correction. As a concrete example, the check matrix H for a (7,4) Hamming code is shown below.

Each column indicates which parity scheme the bits participate in, and the binary representation of the column number represents the corresponding parity scheme.
The received vector r (data + parity) is multiplied by the check matrix H, and the syndrome s is calculated using mod2 operation (*). This value is the key to error detection. This can be expressed as the following formula:

The syndrome s is the coordinate that indicates the error location, allowing a single bit error to be uniquely identified.
*Mod2 operation: This operation obtains the remainder when dividing by 2, and the result is either 0 or 1.

Intuitively, H is a "map of error detection," and syndromes are "coordinates on the map," with the corresponding columns indicating the error locations.

What is an Extended Hamming Code?

Extended Hamming code is a method that can correct single-bit errors as well as detect double-bit errors (SECDED), and is used in a wide range of applications, including servers, automotive, industrial equipment, and IoT. In the memory field, extended Hamming code is primarily used in two methods: "on-die ECC" and "inline ECC," each with different uses and features. On-die ECC performs error correction on the memory controller side, improving the reliability of the entire data transmission path. Inline ECC, on the other hand, improves the overall system security and error detection capabilities by storing the data itself and the ECC code in the same memory space.

Here we will explain the features of SECDED, on-die ECC, and inline ECC.

SECDED

SECDED (Single Error Correction, Double Error Detection) is an extended error correcting code that adds an additional parity bit to the Hamming code, enabling the correction of single-bit errors (Single Error Correction) and the detection of double-bit errors (Double Error Detection). It is currently used as standard in many memory products, including DYNAMIC RAMS and FLASH MEMORIES. Single-bit errors can be located and corrected using a parity check, while double-bit errors are detected as uncorrectable errors by global parity. This design improves detection accuracy while maintaining the correctable range.

merit

Single-bit errors can be corrected and double-bit errors can be detected, preventing system failures.
The number of additional bits is small, minimizing the increase in memory capacity.
Another feature is that it is easy to implement in hardware and can be easily incorporated into DYNAMIC RAMS controllers and SSD controllers.

Disadvantages

Large overhead (for example, adding 8-bit ECC to 64-bit data increases capacity by approximately 12.5%)
Computational cost and delay (encoding and decoding are required for error correction and detection)
Double-bit errors cannot be corrected (two-bit errors can be detected but not corrected)

On-Die ECC

High-capacity, high-throughput memories require a method that can efficiently correct errors on larger data units, making small-scale codes like the Hamming (7,4) code unsuitable. To address this issue, on-die ECC adds 8 parity bits to 128 bits of data to generate a total of 136-bit codewords. This enables efficient error correction even for large amounts of data, and is primarily designed to correct errors that occur within DYNAMIC RAMS chips.

merit

By performing error correction within DYNAMIC RAMS chip, it is possible to reduce the defective product rate (yield) caused by minute defects during manufacturing.
Since error correction processing is completed within the chip, it has almost no impact on the latency (delay) of the entire system.
Since no design changes are required on the system side, error correction functions can be implemented at low additional costs.

Disadvantages

It is possible to correct a single bit error, but it is not possible to detect errors of two or more bits.
The error correction operation is not visible from the system side, which makes it difficult to analyze the cause of a failure when it occurs, and limits the improvement of the reliability, availability, and maintainability (RAS) of the entire system.

Inline ECC

With inline ECC, DYNAMIC RAMS controller generates and verifies the ECC code. Storing the data and ECC code within the same DYNAMIC RAMS chip increases the reliability of the entire data transmission path. This makes it possible to detect and correct not only bit errors within memory cells, but also errors that occur during data transmission between the controller and DYNAMIC RAMS. However, because the data and ECC code are stored in the same memory space, an additional bit area is required for ECC.
This inline ECC widely uses the SECDED Hamming code, which contributes to improving the reliability of the entire system.

merit

The SECDED method makes it possible to correct single-bit errors and detect double-bit errors, significantly improving the reliability of the entire system.
Error management can be performed by the memory controller, making it easy to obtain error logs and link with RAS functions.
It is ideal for applications requiring high reliability, such as automotive and industrial equipment.

Disadvantages

The addition of ECC code requires approximately 12.5% of memory capacity as overhead.
Due to ECC processing, there is a slight delay when reading or writing data.
Implementation costs are somewhat higher because an ECC-enabled controller and DYNAMIC RAMS are required.

Below is a comparison of SECDED, on-die ECC, and inline ECC.

Comparison of SECDED, on-die ECC, and inline ECC

Item	SECDED	On-die ECC	Inline ECC
Main uses	Server High-Reliability Memory (DIMM)	Cell error correction inside DYNAMIC RAMS chip	Data protection, including storage and transmission paths
Error correction capability	1-bit correction, 2-bit detection	Mainly cell-level 1-bit correction	Flexible according to capacity (multiple bits can be corrected)
Detection Ability	Double-bit error detection	Double bits or more cannot be detected	Powerful detection possible depending on implementation
overhead	Add ECC bits (e.g. 64bit + 8bit)	Only inside the chip, no external impact	Varies greatly depending on implementation
Performance impact	Syndrome calculation delay when reading	Virtually no impact on external I/F	Increased latency depending on implementation

Hamming Code Applications

Here, we will explain some typical applications of Hamming codes, dividing them into five fields: memory, storage, communications, automotive/industrial equipment, and FPGA/hardware IP. We will use specific examples to look at how error correction technology is incorporated in each field and how it contributes to improving system safety and performance.

1. Memory field (ECC memory)

With the advancement of AI and IoT, ECC memory is now standard in servers and workstations.
For example, the (72,64) SECDED code (64-bit data + 8-bit parity) is implemented in memory controllers such as DDR4/DDR5 and LPDDR5, and protects the system from soft errors caused by cosmic rays and voltage fluctuations.
Typical examples of deployments include IBM System/360, Intel Xeon, and ARM-based SoCs.

2. Storage field

In SSDs and RAID systems, SECDED is used to improve the long-term reliability of data retention. It is used in RAID2 and enterprise SSDs, and achieves high error detection capabilities with low overhead.

3. Communication Systems

Even in environments with a lot of noise and interference, such as satellite communications, wireless LAN, and IoT communications, incorporating extended Hamming codes can improve communication efficiency and security. Introducing extended Hamming codes into IoT devices has been shown to improve communication accuracy and device performance.

4. Automotive and industrial equipment

SECDED is used in automotive SoCs and industrial control systems, where 1-bit correction and 2-bit detection are mandatory requirements, and is used in LPDDR memory and inline ECC configurations.

5. FPGA/Hardware IP

The FPGA IP core is a 72-bit Extended Hamming Decoder (64-bit data + 8-bit ECC) that can be used with Vivado and other applications. It is optimized for high-reliability communication and data storage, and has single-bit correction, double-bit detection, and error monitoring functions.

Summary

More than half a century after its creation, Hamming code continues to evolve as a fundamental technology in the fields of memory and communications. Due to its simple structure and highly efficient error correction capabilities, it is used in a wide range of applications, including DYNAMIC RAMS, SSDs, network devices, and embedded systems. Going forward, it will likely continue to play an important role as the core of ECC functions in fields that demand high reliability, such as autonomous driving, industrial IoT, medical devices, and cloud servers.

We can propose various memory products to suit your needs and applications. Please feel free Inquiry for product selection or technical consultations.