Mobile Devices: RISC-Java blend powers cores

Mobile Devices: RISC-Java blend powers cores
By Edward Nevill, Director, E-Options Ltd., Andrew Rose, Microsystems Designer, ARM Ltd., Cambridge, England, EE Times
January 6, 2002 (5:23 p.m. EST)
URL: http://www.eetimes.com/story/OEG20010723S0049

A Java system is complicated, expensive and difficult to modify for portable connected devices because it is software-centric. It's not easy to design hardware technology to execute in a Java technology world. For example, the Java byte code language is fundamentally a stack-based language, and mounting a stack-based language onto a register-based core is quite a challenge.

Nor is Java technology ideal for running application code on the device itself, because Java runs more slowly than C, C++ or machine code. As a result, the cost of portability, which is the main reason for using Java technology, is usually performance.

So finding a solution to work with our ARM cores that did not sacrifice performance was a long process, and we went through several designs before we came up with the final design-Jazelle, a way of integrating Java execution right into the ARM core.

In 1997, ARM was working to develop optimized software technolo gy for executing Java. The software technology we investigated included optimized Java Virtual Machines and just-in-time (JIT) compilers. The optimized JVM gave 2.8 times the performance of the Sun standard JVM. That represented a remarkable gain at the time, but it could not satisfy performance requirements for next-generation wireless devices. JIT technology provided 11.8 times the performance of the standard JVM but required six to eight times the memory. In the cost-sensitive world of mobile Internet devices, this kind of memory demand was out of the question.

See related chart

Just-in-time compilation converts the Java byte codes into native machine code and runs it instead of the Java code. But software emulation provides only limited performance unless you use a processor with very high native performance, which means higher cost and high power consumption. JIT is also heavily wasteful of memory-typical compilers are 100 kbytes in size, and compiled code typically expands by a factor of six or eight. To make Java systems available on devices such as mobile phones that will be deployed in the tens of millions, it's essential to cut hardware costs to the bone. Extra memory is a luxury you cannot afford.

Another factor is that although native code runs faster than Java code, JIT is slow to start and thus results in pauses and disruption of user input, depending on whether the code that's being executed is native code or Java code. JIT also makes heavy demands on the CPU during the compilation phase, which translates into shorter battery life.

Clearly, then, a software approach was out of the question.

During 1998, ARM looked for third-party solutions for executing Java in hardware but couldn't find one that fit all the requirements. All those we looked at either suffered from the same p roblem as the Java execution model (JEM)-that is, they would not work on cached processors or were too large to fit the power and die size requirements-or they led to unacceptable increases in memory consumption.

Toward the latter half of 1999, pressure for improved Java technology performance from ARM's customers was increasing. It became obvious that ARM would have to provide its own solution for executing Java technology in hardware. Based on what we had learned over these multiple iterations, we decided that the only way to go was to integrate Java execution into our ARM core. Thus, Jazelle technology was born.

Our objectives were clear. First, we knew that to reduce die size and improve performance, Jazelle technology would be implemented in the ARM pipeline as a finite state machine rather than a traditional microcoded engine. It would be implemented inside the cache, which has important benefits in terms of both power consumption and performance.

Second, the solution would ha ve to dynamically remap Java stack locations to ARM registers, thus avoiding the need for a translation stage. Third, the resultant architecture would have to perform 32-bit fetches in order to fetch up to four Java byte codes at once. Finally, it had to be implemented in a way that allowed all Java instructions to be restartable. That is, an interrupt can be taken in the middle of an executed Java instruction in such a way that interrupt latency is not affected.

ARM processors support two instruction sets: the ARM instruction set, in which all instructions are 32 bits long, and the Thumb instruction set, which compresses the most commonly used ARM instructions into a 16-bit format. The Thumb instruction set typically offers 35 to 40 percent code compression compared with ARM code, which reduces performance slightly. Our instruction set supports procedure calls between ARM and Thumb code, so application programmers typically choose at compile time whether parts of the application should be compiled for performance or code density.

To create Jazelle technology, we added a third instruction set-Java byte code-to the processor, with instruction-set support for entering and exiting Java applications. This instruction set creates a new state in which the processor behaves like a Java machine: It fetches and decodes Java byte codes and maintains the Java operand stack. Once in Java state, the processor is in every way a Java processor, but it can switch easily between Java state and ARM Thumb instruction-set state. In essence, we have made a two-in-one processor-one is an ARM Thumb processor and the other is a Java processor-but with all the performance, memory use, battery life, and space and cost advantages of a single processor.

Entering and exiting Java applications is simple, and can easily be put under the control of any operating system. Interrupts are handled normally, and cause an immediate return from Java state to ARM state to run the interrupt handler. At the end of the interrupt routine, the normal return mechanism will bring the processor back to Java state. This ensures real-time interrupt performance.

The key to making this approach work lies in a single new ARM instruction, "BXJ Rm," for entering Java state. This instruction first performs a test on one of the condition codes. If the condition is met, it then stores the current program counter (PC), puts the processor into Java state, branches to the specified target address and begins executing Java byte codes.

Once in Java state, the ARM PC is extended to 32 bits to address Java byte code. Byte codes are fetched and decoded in two stages (compared with a single decode stage when in ARM Thumb instruction-set state). A new Current Processor Status Register (CPSR) bit records the processor state. This is an important feature, as the CPSR is automatically saved and restored when handling interrupts and exceptions, so Jazelle technology is compatible with the existing ARM interrupt/exception model used by operating systems.

In Java state, the processor assigns several ARM registers to functions specific to the Java machine (for example, R6 = stack pointer, R0-R3 = top elements of stack, R4 = local variable 0). This hardware reuse contributes to the small size of the additional logic (12,000 gates) required to implement the Java machine, and keeps all of the states required by the Jazelle extension in ARM registers, In addition, it ensures compatibility with existing operating systems, interrupt handlers and exception code.

Keeping the top four elements of the stack in ARM registers is an important contributor to the performance of the processor when executing Java technology. Application profiling has shown that the working stack depth for most applications is very small, so this technique reduces memory accesses to a minimum.

The extension we've added divides Java byte codes into three classes: directly executed, emulated and undefined. The majority of the Java byte codes (138 on the ARM926EJ- S microprocessor core) are executed directly in hardware; the remainder are emulated by short sequences of highly optimized ARM instructions. This reduces the complexity (cost and power consumption) of the additional logic required to implement the extensions. Application profiling has shown that the emulated byte codes are encountered less than 5 percent of the time . With the floating-point option, the number of directly executed byte codes increases from 138 to 152.

Undefined byte codes are distinct from emulated byte codes. Encountering any undefined Java byte code will cause the processor to leave Java state and return to an exception handler written in ARM code, which is normally part of the operating system. This also provides a mechanism for supporting future extensions of the Java byte code set. A new byte code function can be implemented by a software patch.

Industry Articles

Mobile Devices: RISC-Java blend powers cores