hero image
Columns

Cortex-M7 Beginner's Guide: From Comparison with Cortex-M4 to ECU Applications

Table of Contents

As the electrification of automobiles progresses, there is a demand for higher performance microcontrollers installed in electronic control units (ECUs). There are various types of CPU architectures, which are the brains of microcontrollers, but one that is attracting attention is ARM's "Cortex-M7" core. This article explains the characteristics of the Cortex-M7, compares it with the Cortex-M4, and provides examples of its use in ECU development in an easy-to-understand manner, even for those who are new to the technology.

What is the Cortex-M series?

ARM offers three CPU architectures for embedded devices: Cortex-A, Cortex-R, and Cortex-M.

The Cortex series is a group of microcontroller cores that use RISC (Reduced Instruction Set Computer) architecture. RISC is a design concept that simplifies the instruction set to achieve high-speed processing and low power consumption. The Cortex series is divided into three: Cortex-A is for high-performance applications, Cortex-R is for real-time control, and Cortex-M is for embedded control.
The Cortex series achieves low power consumption and high-speed processing by simplifying the instruction set, and combines real-time capabilities with high efficiency, making it suitable for use in a wide range of fields, including IoT devices and automotive ECUs.

Applications of the Cortex series

seriesExplanationMain uses
Cortex-MFor embedded controlIoT, sensor control, home appliances, wearables, etc.
Cortex-RFor real-time controlAutomotive control, medical equipment, industrial control
Cortex-AFor high performance applicationsSmart home appliances, industrial HMI, edge AI, Linux-based devices

Features of Cortex-M7

Block diagram of Cortex-M4 and Cortex-M7

Block diagram of Cortex-M4 and Cortex-M7 (Source: ARM)

The ARM Cortex-M series comes in several varieties depending on the application. The M0/M0+ are low-power and designed for IoT, the M3 is for general-purpose control, the M4 is equipped with DSP instructions and an FPU and is suitable for voice and motor control, and the M33 and M55 are suitable for security and AI processing. Of these, the Cortex-M7 is attracting attention as an MCU that can handle the high computational loads and complex control requirements necessary for next-generation automotive control.

The Cortex-M7 is a high-performance embedded microcontroller. Its DPU (Digital Processing Unit) is equipped with two ALUs (one of which supports SIMD instructions), a MAC unit (multiplication and addition in one clock cycle), an FPU (floating-point arithmetic), an LSU (memory Access), and a BTAC (branch prediction cache), enabling high-speed processing of DSP instructions. Furthermore, the instruction TCM and data TCM enable zero-wait Access, accelerating control loops and interrupt responses that require real-time performance. Other features include prioritized interrupt control by the NVIC, parallel processing of multiple buses by the Bus Matrix, and peripheral device connectivity via the AHB/APB interface. This configuration makes the Cortex-M7 suitable for a wide range of embedded applications, including signal processing, control, and HMI.

Cortex-M7 functional diagram

Cortex-M7 functional diagram (Source: ARM)

Comparing Cortex-M4 and M7

The Cortex-M7 supports the same ARMv7-M instruction set as the Cortex-M4, allowing you to use code assets for the M4 as is. The M7 has a significantly enhanced architecture compared to the M4, giving it advantages in both computational performance and real-time performance. The main differences between the M4 and M7 are summarized below.

Key differences between Cortex-M4 and Cortex-M7

Comparison itemsCortex-M4Cortex-M7
Clock frequency~200MHz (product dependent)Up to 1GHz (product dependent)
FPUSingle precision (optional)Single precision/double precision (optional)
TCMnoneYes (0 wait Access)
cachenoneInstruction cache (ICU) and data cache (DCU)
Pipeline3 stages6 stages + super scalar
Performance indicatorsApproximately 1.25 DMIPS/MHzApproximately 2.14 DMIPS/MHz
MPU(*) areaUp to 8 areas (optional)Up to 8 areas (optional)

(*) MPU: Memory Protection Unit, a hardware function that sets Access permissions for reading, writing, and execution for each memory area to prevent unauthorized Access.

First, the clock frequency of the M4 is generally around 200MHz, while the M7 can reach up to 1GHz depending on the product. This dramatically improves processing throughput. Furthermore, while the M4's FPU only supports single precision, the M7 supports both single and double precision, achieving high accuracy in control and signal processing that makes heavy use of floating-point arithmetic.

DSP functions have also been enhanced, and the M7 features SIMD instructions and single-cycle MAC instructions, allowing it to efficiently handle loads such as FILTERS processing and audio calculations.In addition, the M7 is equipped with an instruction TCM and a data TCM, which minimize interrupt response and control loop delays through zero-wait Access products are equipped with cache functions, and instruction cache (ICU) and data cache (DCU) make FILTERS Access more efficient.

In terms of pipeline configuration, the M4 has three stages, while the M7 has six stages and is equipped with superscalar functionality, allowing up to two instructions to be issued simultaneously. This parallel processing enables faster calculation and control processing. There is also a large difference in performance indicators, with the M4 achieving approximately 1.25 DMIPS/MHz and the M7 achieving approximately 2.14 DMIPS/MHz.

As such, the Cortex-M7 has superior characteristics in terms of computational performance, real-time capabilities, and memory configuration compared to the Cortex-M4, making it the ideal choice for automotive control and high-load embedded applications.

Explanation of the main functional blocks of the Cortex-M7

We will explain the main functional blocks of the Cortex-M7.

  • High clock frequency
    Some products equipped with the Cortex-M7 core can operate at up to 1 GHz. However, the actual operating clock frequency depends on the implementation specifications of each product. This is determined not by the theoretical performance of the core itself, but by each manufacturer's design policy and compatibility with peripheral circuits.
    Typical embedded MCUs operate in the range of tens to hundreds of MHz. The Cortex-R series for real-time control operates at around 200 to 1000 MHz, while the Cortex-A series for high-performance applications operates at around several GHz, so the performance of the Cortex-M7 is positioned between these two.
  • FPU
    Some Cortex-M7 models are equipped with an FPU (Floating Point Unit). The FPU is a dedicated unit that processes floating-point calculations at high speed in hardware, significantly improving calculation speed compared to software emulation. The FPU demonstrates high processing performance in motor control algorithms, digital signal processing (DSP), image and audio processing, and more. The Cortex-M7 FPU supports IEEE 754-compliant single-precision and double-precision floating-point calculations.
    *Whether or not an FPU is included depends on the implementation of each product, so please check the specifications before use.
  • Extended DSP
    The Cortex-M7 has an instruction set specialized for DSP (Digital Signal Processing) processing.

    Key features include:
  • SIMD (Single Instruction Multiple Data) instructions: Instructions that perform operations on multiple pieces of data simultaneously. For example, addition or multiplication of multiple 16-bit integers can be performed with a single instruction.
  • MAC instruction (Multiply-Accumulate): An instruction that performs multiplication and addition at the same time. It is frequently used in FILTERS processing and FFT.
  • CMSIS-DSP Library: A DSP library provided by ARM that includes functions such as FFT, FILTERS, and statistical processing, and enables high-speed processing by utilizing the Cortex-M7 DSP instructions.

    These instructions enable highly efficient execution of real-time audio processing and FILTERS calculations. The Cortex-M7 is ideal for embedded applications that require DSP processing.
  • TCM
    TCM (Tightly Coupled Memory) is on-chip memory located directly connected to the processor core, and is divided into instruction TCM and data TCM. Instruction TCM is used to store program code, while data TCM is used to store variables and buffers, and both can Access with zero wait cycles. By utilizing this TCM, the Cortex-M7 can minimize delays in processes that require real-time performance, such as interrupt responses and control loops. This allows for stable performance in embedded applications that require high-precision control and fast response.
  • ICU, DCU
    A cache is a high-speed buffer placed between memory and the CPU, and the Cortex-M7 is optionally equipped with an instruction cache (ICU) and a data cache (DCU). The ICU temporarily stores program instructions, speeding up the retrieval of repeatedly executed code. The DCU stores data to be read or written to memory, streamlining frequent memory Access.
    These cache functions enable the Cortex-M7 to reduce the number of memory Access, lower processing latency, and improve overall processing performance.
  • Pipelining
    Pipelining is a technology that divides instruction processing into multiple stages and processes each stage in parallel to increase processing efficiency. The Cortex-M7 uses a six-stage pipeline, with each stage (prefetch, instruction decode, instruction issue, instruction execution, memory Access, write-back, etc.) operating independently. This reduces the processing time per instruction and improves throughput.
  • Superscalar Function
    Superscalar is a technology that issues and executes multiple instructions simultaneously, and the Cortex-M7 can issue up to two instructions simultaneously. If the conditions are right, it can process two instructions in parallel, significantly improving instruction execution performance. This enables faster arithmetic and control processing compared to the Cortex-M4.
  • ARMv7-M instruction set
  • ARMv7-M is a 32-bit CPU instruction set standard designed for embedded devices. The Cortex-M7 uses this instruction set and incorporates several features to achieve both real-time performance and low power consumption. Thumb instruction format: Compressing the instruction length to 16 or 32 bits reduces code size and improves memory efficiency and power consumption.
  • NVIC (Nested Vectored Interrupt Controller): Manages interrupt signals in order of priority and instantly switches between processes at the hardware level, minimizing response delays.

    These features make ARMv7-M the ideal instruction set for applications where latency is not an option, such as real-time control, audio and image processing, and sensor data analysis.

Utilizing Cortex-M7 in ECU development

In recent years, automotive control has seen increasing electrification of electronic power steering (EPS), electric brakes, electric throttles, etc. These systems require high real-time capabilities and high computing performance to respond immediately to driver operations.

  • EPS: An electric motor assists steering force in response to steering inputs
  • Electric brake: Precise control of braking force through electronic control
  • Electric throttle: The throttle opening is controlled by a motor in response to accelerator operation

Many ECUs use the Arm Cortex-M7 to meet the demand for high-speed calculations such as torque control and FILTERS.
In terms of real-time performance in particular, the TCM, which allows zero-wait Access, enables stable, high-speed calculations that are not dependent on external cache. Unlike the Cortex-A series, there is no variation in execution time due to cache misses, so it is widely used in the field of automotive control as a processor suitable for hard real-time control.

Image of electric power steering

Image of electric power steering

Summary

The Cortex-M7 is a microcontroller that combines high precision and fast response in the automotive control field, where electrification is progressing. Its compatibility with the Cortex-M4 means that existing development environments, tool chains, and software assets can be used as is, which is a major advantage. Furthermore, zero-wait Access by the TCM and enhanced computing performance by FPU and DSP instructions enable stable execution of control processing that requires real-time performance. This ensures reliable response even in situations where emergency control is required, such as electric power steering and electric braking.

When selecting an in-vehicle microcontroller in the future, the Cortex-M7 is expected to become even more important as an option that combines high performance and high reliability.

Related Product Information