Achieving higher performance in a multicore-based packet processing engine design

By Michael Coward, CTO and co-founder, Continuous Computing
(01/01/08, 05:00:00 PM EST) -- Embedded.com

A new class of processor has begun to appear in a variety of storage, security, wireless base stations, and networking applications to replace the very expensive - with long lead times to boot - proprietary Application Specific Integrated Circuits (ASICs) developed by OEM system solution providers as well as those designed by industry giants, such as LSI Logic and IBM.

This new class of multi-core processor is made up of eight, sixteen, even sixty-four individual processor cores with integrated memory controllers, various I/O interfaces, and separate acceleration engines.

Though this new class of processor has made great strides in overcoming the limitations of earlier generation processors, not all of the "new class" of multi-core processors are created equal. Some companies that develop these processors add threading capability to overcome memory latency, and also include native 10Gbps interfaces, while others include security engines and even regular expression engines that support very special applications.

Rather than examining all the features across a number of multi-core processors and comparing them bit by bit, this paper will focus on one critical architectural element, the memory subsystem. The memory subsystem is critical because this is a major factor in determining the scalability and upper limits of performance that a processor can achieve.

The memory architectures compared here are based on two leading multi-core processors in the market today:

1. Single channel, wide cache line (Single / Wide)
2. Dual channel, narrow cache line (Dual / Narrow)

The question to be addressed is: Which architecture is superior in providing the performance necessary to keep up with the ever growing voice, video, and data traffic that the market is requiring today?

Click here to read more ...