Embedded Systems -> VLIW chip complicates pSOS porting

VLIW chip complicates pSOS porting
By Mohammad Ayub Khan, Senior Fellow, Trimedia Technologies Inc., Milpitas, Calif., EE Times
April 6, 2001 (2:22 p.m. EST)
URL: http://www.eetimes.com/story/OEG20010406S0054

A new breed of micropro-cessor based on very long instruction word (VLIW) technology is well suited for embedded systems. This is due to the fact that VLIW architectural simplicity is achieved at the expense of complexity in the software. Unfortunately, it is the RTOS that gets the lion's share of this complexity.

Unlike traditional RISC-based embedded CPUs, a VLIW media processor achieves performance by exploiting instruction-level parallelism and not having to worry about data dependencies across processor cycles. These dependencies are secured in software by the programmer or the compiler by means of careful instruction scheduling.

The VLIW processor also processes interrupts only when instructed to do so, unlike a traditional microprocessor that checks for pending interrupts after each processor cycle.

So, a VLIW media processor can exacerbate the problems associated with the process of porting an operating system. Virtually every OS has a processor- dependent part, generally written in assembly language, and a processor-independent part. But writing assembly language for a VLIW processor is no an easy task due to the complexity surrounding such matters as order-of-magnitude functional units, a deeper pipeline, large register files and multi-issues slots, and a complex state machine.

However, in the case of the Trimedia VLIW media processor, the RTOS is entirely written in C language. Implementing low-level functionality in a higher-level language is a major challenge, one that requires that syntax and semantics consistency be maintained for low-level routines.

Wind River Systems' pSOS was chosen as a reference OS for the Trimedia VLIW media processor. To ensure that the port is syntactically and semantically correct, Trimedia submitted this implementation to Integrated Systems Inc. (ISI ), which certified that this is, in fact, a full C implementation, compliant with ISI implementations for other microprocessors.

A system designer must contend with such issues as context switch time, interrupt latency, nested interrupts and task scheduling. In the context switch area, the designer should get a good handle on the type and amount of information being saved as part of context switching. If the context defined is excessive when switching a process or task, the context will take an inordinate amount of time and real-time constraints may not be met.

As for interrupt latency, it is important to determine the amount of time it takes to access an interrupt after it has been posted. Interrupt latency is a measure for either typical or worst-case delay in servicing an interrupt after it is posted. For Trimedia audio and video applications, low typical interrupt latency is more important than a low worst-case latency. Higher interrupt latencies are usually caused by an application that either temporarily disables interrupt servicing to process critical regions, or temporarily disables the servicing of other interrupts in interrupt service routines from processing through to completion.

The designer may lose interrupts and thus, vital data, if an interrupt is not serviced before another interrupt from the same pin is posted. To ensure that all interrupts are properly serviced, the designer should make sure that interrupts are not nested. Nested interrupts are generally not recommended.

Scheduling is the other major issue designers must take into account. Embedded consumer applications require preemptive, priority-driven task scheduling to guarantee their hard real-time constraints.

As an application example, let's take the pSOS+m operating system-the name denotes this is the multiprocessing version of the RTOS-running on a Trimedia VLIW media processor. This VLIW processor targets high-performance multimedia applications ranging from videophones, video editing system s, digital TV, security systems and set-top boxes to reprogrammable, multipurpose plug-in cards for PCs. It implements such standards as MPEG-1 and MPEG-2, and its DSP-based CPU implements a variety of multimedia algorithms. The VLIW processor is a fluid computer system controlled by the real-time kernel of the small-footprint pSOS+m operating system. Aside from the DSP CPU core, the VLIW processor includes a high-bandwidth internal bus and internal bus-mastering DMA peripherals.

The pSOS+m kernel views applications as a collection of tasks, I/O device drivers and interrupt service routines (ISRs). It consists of various system calls that can be used by an embedded application. These system calls provide functionality for task management, semaphores, message queues, dynamic memory allocation, time management, I/O functions, event macros, asynchronous signals and fatal-error handling. Services provided by pSOS form the basis of the Trimedia Software Streaming Architecture. The kernel provides the syst em designer with concurrent and independent task execution by switching between tasks on queues generated by system calls made to the kernel.

This RTOS also offers synchronization and communication primitives in a seamless fashion to support multiprocessor systems. It comes with an extensive collection of key design features that includes fully preemptive, priority-based task scheduling, streamlined interrupt handling and dynamic, object-based multitasking.

The RTOS' priority-based, preemptive scheduling algorithm ensures that, at any point in time, the running task is the one with the highest priority among all ready-to-run tasks in the system. However, the design engineer can modify the scheduling behavior by selectively enabling and disabling pre-emption or time slicing, for one or more tasks.

Each task has a mode word with two bits that can be set to affect scheduling. One controls the task's ability to be pre-empted. If this bit is disabled, the task will continue to run once it enters the running state, even if other tasks of higher priority enter the ready state. A task switch occurs only if the running task blocks it, or if it re-enables pre-emption.

The second mode bit controls time slicing. If the running task's time-slice bit is enabled, the RTOS kernel automatically tracks how long the task has been running. When the task exceeds the predetermined time slice and other tasks with the same priority are ready to run, the RTOS kernel switches to run one of those tasks. Time slicing only affects scheduling among tasks of equal priority.

The design engineer assigns a priority to each task when it is created. There are 256 priority levels-255 is the highest, 0 the lowest. Certain priority levels are reserved for use by special pSOS tasks. For example, Level 0 is reserved for the IDLE daemon task furnished by the RTOS kernel. Levels 240 to 255 are set aside for a variety of high-priority tasks, including the pSOS+ ROOT task, which runs at Level 240.

Indexed queu e
It is important for the design engineer to know that when a task enters the ready state, the RTOS kernel puts it into an indexed ready queue behind tasks of higher or equal priority. All ready-queue operations, including insertions and removals, are achieved in fast, constant time.

During dispatch, when the task is about to exit and return to the application code, the RTOS kernel usually runs the task with the highest priority in the ready queue. If this is the same task that was last running, then the OS kernel simply returns to it.

Otherwise, the last running task must have either blocked it, or one or more ready tasks have higher priority. If blocked, the RTOS kernel switches to run the task currently at the top of the indexed ready queue. In the second pre-emption case, the RTOS kernel also performs a task switch, unless the last running task has its pre-emption mode disabled. Here, the dispatcher has no choice but to return to it.

An embedded pSOS-based application is partit ioned into a set of tasks and interrupt service routines. Each task is a thread of independent actions that can execute concurrently with other tasks. However, these cooperating tasks need to exchange data, synchronize actions or share exclusive resources.

The RTOS kernel has three sets of facilities for the purpose of servicing task-to-task, as well as ISR-to-task, communication, synchronization and mutual exclusion. These are message queues, events and semaphores.

Message queues provide a highly flexible, general-purpose method to implement communication and synchronization. Like a task, a message queue is an abstract object created dynamically using the q_create system call. The q_create call accepts as input a user-assigned name and several characteristics, including whether tasks waiting for messages there will wait first-in, first-out or by tasking priority; whether the message queue's length is limited; and whether a set of message buffers will be reserved for its private use.

A queue is not explicitly bound to any task. One or more tasks can send messages to a queue, and one or more tasks can request messages from it. Therefore, a message queue serves as a many-to-many communication switching station. For instance, here is a simple many-to-one communication case in point.

A server task can use a message queue as its input request queue. Several client tasks independently send request messages to this queue. The server task waits at this queue for input requests, processes them and goes back for more-a simple single-queue, single-server implementation.

This RTOS kernel also provides the designer a set of synchronization-by-event facilities. Each task has 32 event flags it can wait on, bit-wise encoded in a 32-bit word. The higher 16 bits are reserved for system use, while the lower 16 event flags are user definable. Two system calls provide synchronization by events between tasks and between tasks and ISRs. The ev_send (send events to a task) call is used to send o ne or more events to another task. With ev_receive (get or wait for events), a task can await-with or without timeout-or request without waiting one or more of its own events. An important feature of these events is that a task can wait for one event, one of several events or all of several events.

The third grouping of RTOS facilities involves semaphore operations, which are useful as resource tokens for implementing mutual exclusion. Like a message queue, a semaphore is an abstract object, created dynamically using the sm_create system call. The sm_create call accepts as input a user-assigned name, an initial count and several characteristics, including whether tasks waiting for the semaphore will wait first-in, first-out or by task priority.

For communication between different instances of the pSOS+m kernel in a closely coupled multiprocessor system, the pSOS+m kernel relies on a communication module with a fixed application programming interface. This module is to be provided by the embed ded-system developer and hence, abstracts from the physical medium connecting the various processors.

For pSOS+m on a Trimedia VLIW processor, for example, the reference implementation, which employs shared memory over the PCI bus, can be used.

The pSOS+m RTOS handles interrupts via interrupt service routines. Interrupts can be passed directly to ISRs to provide the fastest possible response time. System calls made from an ISR return to the ISR, thus eliminating time-consuming kernel scheduling.

These ISRs are critical in a real-time RTOS. On the one hand, an ISR handles interrupts and performs whatever minimum action is required to reset a device or to read or write data, for example. On the other hand, an ISR may drive one or more tasks and cause them to respond to and process the conditions related to the interrupt.