Cortex-M And Classical Series ARM Architecture Comparisons

By Guruprasad Vadhiraj Putty

1 ABSTRACT

ARM has introduced many processors. Each set or groups of processors are having different core and different Features. A new entrant or Designer to the ARM can make use of this paper for easy understanding and choose a processor that is well suited for the requirements. This paper gives brief comparison of the Architectures.

2 INTRODUCTION

There are many papers on ARM today but most of them are related to comparison of performances or the improvements made over the previous Architecture. This paper brings out the architectural comparisons between and Classical ARM processors and cortex-M3. The classical ARM series refers to processors starting from ARM9 to ARM11. It tries to explain each module and the usability for industrial control systems. During this process questions are raised for some modules. Whether it is relevant to user for his requirements is left to the end user himself. It does not cover in detail about the power management.

3 COMPARISON OF ARCHITECTURES

Architectural Block	ARM Classic Series	Cortex
Core	ARM9-ARM11	Cortex M3
Instruction Tightly coupled memory	Yes	No
Data Tightly Coupled memory	Yes	No
Cache	Yes	No
Co-Processor	Yes	No
System Controller	Yes	Yes
MMU	Yes	No
MPU	Yes	Yes
Debug	Yes	Yes
Write Buffer	Yes
System Timer	No	Yes
Nested Vectored interrupt Controller	No	Yes

Note: In the classical series ITCM,DTCM,CACHE,MMU and MPU will not be in a single core. If there is ITCM,DTCM and MPU then there would not be CACHE and MMU.

Each block is explained below with respect to industrial control systems.

3.2 CLASSICAL SERIES

3.1.1 Instruction Tightly Coupled Memory (ITCM)

This is useful where the read cycles of instructions are deterministic. In other words number of cycles required to read the instruction remains consistent and faster.

There is a disadvantage with this method. There should be boot code to read code from external memory or for downloading it through USB which copies in to SRAM and again it needs to be copied in to ITCM.

Not useful when code size becomes too large or when OS is used. Industrial control systems mostly prefer to run the code from inbuilt flash or external flash.

3.1.2 Data Tightly Coupled Memory (DTCM)

It is highly useful for storing the data. It helps in faster access. Data segments can have DTCM address in the linker script file. DTCM always accompanies ITCM. There is no processor where only DTCM block is available.

3.1.3 Cache

It plays an important role if the code or data resides in external memory. Though the number of cycles to read the instruction or data varies depending on cache hit or miss, it greatly improves the performance especially incase of operating system.

If you are loading the only firmware without an operating system then cache utility is less.

3.1.4 Write Buffer

Cache in combination with write buffer determines the type of cache. It would be either write back or write through cache.

Write buffer without cache helps the processor to write to external memory without needing to wait for the write operation to complete.

Cache and write buffer module is useful when external memory is used. Industrial control systems , extensively makes use of flash within the chip for programming and running the code and use eeprom (which can be accessed by i2c) for storing the data base and industrial parameters .yes, there are cases where external memory is used , in such a scenario there are chips in cortex series and classic series having an inbuilt memory controller.

Classical ARM series provides better option as DTCM (for storing the data can be used).

3.1.5 Co-processor:

It is relevant when MMU, Cache, ITCM, DTCM and MPU(In cortex MPU can be used and has been designed without use of co-processor) needs to be used .Its utility depends on the type of application.

For e.g.: if application is about temperature measurement, fault detection system then absence of co-processor will not have much impact.

3.1.6 MMU and MPU:

When Linux or other operating systems has to be ported then classical series(not all of them have MMU. you need to select chips which has MMU ) is best suited.

MMU maps the virtual address , address that is used by OS in to physical address which can be understood by the memory controllers.

uCos can be ported on cortex series, this makes use of MPU. Using MPU you can make certain sections of memory as No Access, Read Only or Read Write.

I was working in a semiconductor company and was part of the team which was designing the chip for a specific network protocol. This protocol was still being standardized and our company was part of the forum. We were asked to analyze the performance of the chip. We started writing the firmware , used software queue’s for message receive and transmit and did not use any operating systems and all the events were interrupt driven. My question is, Are we using operating system just because we need to use operating system? The debate is not on the usefulness of operating system but where it is best applicable.

I met a design engineer in power station, during the course of the discussion he was explaining me the design in broader way. They were using 32 bit processor with Linux running. Being a firmware guy I asked why OS? The reply was that each module delivers a message and this used by the tasks for control operations. In such cases where number of messages from each module is critical and more, then OS is best suited.

The same cannot be said for operations involving temperature, Real time clock, Uart etc

Finally it is for the end designer to carefully understand the requirements and needs.

3.1.7 Interrupts

Three parameters is always discussed when interrupt topic is raised

Latency
Number of interrupts
Preemption

Latency:

Cortex scores here over classical series where the Interrupt latency is less for the fact that it can fetch and branch to ISR address during the execution of LDM and STM instruction whereas in classic series (other than ARM1176 which also abandons LDM and STM) it completes the execution of LDM and STM before branching to ISR and in such cases ISR latency is determined by number of parameters specified in the instruction.

Other advantages of Cortex M

Automatic state saving on IRQ entry and restored on IRQ exit
State saving and reading of ISR address from vector location is done simultaneously.

Number of interrupts:

There are 239(depends on the chip manufacturer as well) interrupts and priority can be set for the interrupts in cortex core. ATMEL AT91RM9200 (which used 920t)soc has given the option for many interrupts like timer, rtc uart etc which is same as we find in NXP1768 which uses cortex M3( I just took an example). For the firmware developer there is no difference between the two. So to conclude that cortex M is advantageous because it has NVIC is arguable.

Other parameter like pending is same as pending register in 9200 and both convey the same message.

Preemption:

You can have this luxury in cortex series and this has been designed such that if the first instruction is not executed in ISR and if higher priority interrupt is raised then control fetches higher priority vector address and braches to it .

Cortex m3 is mainly targeted for industrial applications where the events are mostly interrupt driven and preemption is essential and this block is well suited for this.

3.2 CORTEX M series

Cortex M3 architecture blocks

System Block
system timer
Nested Vectored Interrupt Controller
Memory Protection Unit

3.2.1 System Block

All key control and status features are handled by this block like software reset, power management, Fault status information system exception.

In some classic ARM series there is a block called System controller which communicates with Bus Interface Unit to stall the processor when AHB access is performed. An example is writing to external memory.

3.2.2 System timer

This is specifically designed for use by the operating system. It is a 24 bit counter. Let us consider the two cases here

a. when OS is not ported:

In this scenario system timer will not have much usage. If you think firmware can make use of this then most of cortex m3 chips like nxp1768 and atmel SAM3S series provide more than two timers/counters (nxp has three 32bit timers and atmel has six 16 bit timers).

b. when OS is ported:

If the application do not use consume all the timers, then OS can very well make use of one of the timers that are available. System timer is not essential.

My observation is that system timer is a luxury but not a. necessity.

3.2.3 Nested Vectored Interrupt controller

Improvement of interrupt handling mechanisms in cortex is already explained. The advantage in cortex is the tail chaining and handling of late arriving interrupts.

4 Instruction Set Architecture and reverse compatibility

Cortex supports thumb2 instruction which is a blend of 32 and 16 bit instructions. Though thumb2 is advantageous, code written for cortex series cannot be ported to ARM9,ARM10 and some ARM11(ARM11 that do not have thumb2 support) series. It’s because all 32 bit instructions are suffixed with .w . For ex:

ADD.W is 32 bit

ADD is 16 Bit Instruction.

There are some changes required for reverse portability.

The same applies for code written for classical series to be ported to cortex series

The advantage of cortex is not restricted to thumb2 but also some bit manipulation instructions.

The only question mark you have is, can we write inline assembly? The answer is probably NO. Because there is only thumb mode and we cannot write thumb mode assembly code in C.

Now let’s focus on the various models of the two architectures

programmers model
Exception model
Fault handling
Power management

5 Programmers model

The table below gives out the differences between two architectures

	Cortex M3	Classical Series
Processor modes	Thread Handler	User Supervisor System Undefined Abort FIQ IRQ
Privilege access	Thread mode will be in privileged and can be changed to unprivileged by writing in to control register Handler will always be privileged	Other than user , all the modes are privileged
Stack	main stack Process stack	1. you have separate stack for each of the modes(user and system have the same)

On Reset cortex-M series will be in thread mode and will have privileged access while in the classical series processor will be in supervisor with the same access rights, the difference being that in cortex M series we can change it to unprivileged (once changed it cannot be changed to privileged from unprivileged. Only an exception can do it) but in classical series you have to change the processor mode to user. Once mode is changed an exception should occur for the mode to change again.

All the exceptions will be in handler mode for the cortex while in classic series it can be abort, undefined, fiq, irq.

To summarize, the cortex has simplified the processor mode for easy implementation.

Talking about stack, there are two kinds of stack as explained above, handler mode always uses the process stack and thread mode can make use of main stack or process stack. This can be configured in control register.

It’s simpler for firmware engineer to work with cortex M than with classical series.

6 Exception Model and Fault Handling

	Cortex M3	Classical series
Exceptions	Reset Non Maskable Interrupt Hard Fault Memory management Bus fault Usage fault System service call Debug Monitor Pending request for system service(Pend SV) Systick External interrupt( uart,timer,rtc etc)	Reset Data Abort Prefetch abort FIQ IRQ Undefined Instruction SWI

Priorities of the interrupt will not be discussed in detail as this paper is mainly concentrated on Architecture comparisons.

Cortex M has better description of the exceptions and fault analysis has been made simpler over the classical series

Let us consider prefetch abort in classical series. This exception is raised when processor tries to fetch an instruction from a memory region whose attributes has been set as No Access by the MPU or the address given by the processor for fetching does not exist.

In the case of Cortex M the exception caused by MPU attributes is called as Memory Management exception and exception because of bad address is classified under Bus fault.

So the cause of the exception can be clearly identified in Cortex M.

If the processor could not push or pop from the data stack then it results in Exception generation. If the stack is in MPU region then it falls in memory management fault or it is defined under Bus fault. This is the additional feature that can be found in cortex M series.

All exceptions use Main Stack while in thread mode there is option to select main stack or process stack.

Handling Exceptions looks simpler and easy to implement.

The only debatable point is the configuration of priorities of exceptions in cortex(other than Reset, Non Maskable Interrupt and Hard Fault). My observation is that all the processor /processor related hardware (except external interrupt)should have a fixed priority as in classical series.

7 Fault Handling

Fault/exception handling is greatly simplified in Cortex M series. You don’t need to subtract the link register by 4 or 8(for base updated data abort models) in the handler and do STMFD or LDMFD in the exception routine. All of them is internally taken and it is just need to load the PC with LR.

8 Power Management

This requires reading of data sheets of the controllers using cortex M and classical series. This may vary but would like to add the type of modes available in two series

Cortex series has

Sleep mode
Deep sleep mode

NXP 1768 provides another two features power down and deep power down mode

ARM1176 has

a. standby mode
b. shutdown mode

The selection depends on the requirements and the application.

9 Miscellaneous

	Classical Series	Cortex M3
Selection of endianess (Big or Little Endian)	A bit in register1 of coprocessor15 Determines the endianess	BIGEND configuration signal (external pin)is sampled during reset
Status registers	CPSR and SPSR	APSR IPSR EPSR All the three embedded in a single register called as XPSR
Special Registers	------	PRIMASK FAULT MASK BASE PRI CONTROL First three is used for disabling of exceptions control is used for stack and access (privileged or non privileged)selection
MPU	Enabling and size is defined in coprocessor	Separate set of registers are available
Pipeline stages	5-8	3
Vector location	It can be at 0x00000000 (low vector) Or 0xFFFFFFFF(High vector)	It can be at 0x00000000 Or 0x00000080 to 0x3FFFFF80, Offset has to be specified in the vector table offset register
stack	Stack can be full ascending full descending Empty ascending Empty descending	It is always full descending
Exception entry	XPSR, PC,R0,R1,R2,R3,R12, LR	Typically R4-R12 and LR will be pushed to the stack

10 Conclusion

Classical ARM series can be selected when

faster execution of instructions(ITCM,DTCM,ICACHE and DCACHE)
Operating systems that makes use of MMU
Interrupt latency has minimum effect on the overall system performance

CORTEX M3 can be selected when selection criteria is

cost
Lower interrupt latency and tail chaining of interrupts
Pin Bring out should be less(depending on number of devices in the final SOC)
less power consumption
code size or memory foot print
simple applications

11 REFERENCES

Cortex M3 Technical reference Manual
ARM Architecture reference Manual
Atmel At91RM9200 Data sheet
Atmel SAM3S data sheet

12 CONTACT

Guruprasad
Mobile +91- 9739817849
Email: guruvadhiraj@gmail.com