ICE-IP-338 High-speed XTS-GCM Multi Stream Inline Cipher Engine
5c Over AV Link: A Comparison of Architectures
Synopsys (I) Pvt. Ltd
Bangalore - India
Abstract
Enhancing the 1394 Audio Video (AV) link core to incorporate content protection (M6) presents various architectural and design challenges to gate count and configurability. These issues are even more pronounced from an IP development perspective. In this paper, the basic structure of two M6 architectures, the Per-Channel M6 and Shared M6, are described, and the two architectures relative strengths and weaknesses are compared. In conclusion, the preferred IP architecture for specific applications is proposed.
Introduction
A 1394 AV link core with DTCP can be built with all DTCP encryption layers implemented in software. Data throughput demands, however, have forced designers to implement the DTCP encryption layer’s content protection in hardware.
A hardware implementation of content protection can be incorporated into an AV link core using different architectures. This section provides a brief introduction to IEEE 1394 technology and to the AV layer, along with a brief description of the 1394 Link layer with AV and DTCP layers.
IEEE 1394 Technology
1394 networking provides a simple, low-cost, highspeed, scalable data interface capable of high-speed asynchronous and isochronous (real-time) data transfers. Among the applications that benefit from 1394 technology are non-linear video and high-end audio products and other multimedia applications. The 1394 protocol has low-overhead high data rates. Its ability to mix asynchronous control and object data in real time on a single connection, and its ability to mix high- and low-speed devices on the same network, have made it a truly universal connection to any consumer product.
Consumer Audio/Video Digital Interface (AV)
The IEC 61883 AV specification specifies a digital interface and defines a transmission protocol for audio-video data and control commands, enabling the interconnection of digital audio and video equipment using 1394.
The 1394 AV Link core implements the 1394 Link layer and AV in hardware. The major function of 1394 link hardware is to pack or unpack data into the 1394 packet format as it transmits or receives it over the 1394 network, while the AV hardware attaches headers to each AV isochronous packet transmitted on the bus. These headers allow reconstruction of the source packet on the receive side. The received packet is also presented to the receiving application in an exact timing relationship with the other packets of the same channel using time stamp information decoded from the received AV packet. Other functions, such as connection and bus management, are handled at the application layer of the protocol.
Digital Transmission Content Protection-DTCP
The DTCP standard provides manufacturers with a simple, inexpensive encryption implementation while maintaining a high degree of content protection. This standard defines a cryptographic protocol for protecting audio-video media content from illegal copying, interception, and tampering as it traverses high-performance digital buses. The transparent DTCP framework allows consumers to enjoy high-quality, copy-protected digital pictures and sound without any performance or quality impact.
A DTCP system addresses four fundamental layers of copy protection:
- Authentication and key exchange
- Content encryption
- Copy control information
- System renewability
1394 AV Link Architecture Without DTCP
The basic AV Link architecture is discussed below. This is the existing architecture into which the content protection specified by DTCP is incorporated. The block diagram below shows this basic architecture without DTCP feature.
Figure 1: Basic Architecture without DTCP
The basic non-DTCP architecture consists of following blocks
- 1394 Link Block
This block implements complete link layer functionality as specified in the 1394 standard. The link’s primary functions are to receive packets from the Transmit AV block and send them across the 1394 serial bus via the physical layer, and to receive packets from the bus via the physical layer and forward them to the Receive AV block.
- Packetizer
This module interfaces with the application through the transmit FIFOs. The application pumps the data at its own clock frequency and the Packetizer reads the data from the FIFO at the link frequency. This block packetizes data after reading it from the transmit FIFOs. The Packetizer inserts AV headers as defined in AV for every AV isochronous transmit packet.
- De-Packetizer
The De-Packetizer receives 1394 packets from the link core, block-constructs source packets from the incoming 1394 packets, and stores the complete source packets in the appropriate receive FIFOs. Data is then forwarded to the application at the presentation time indicated by the timestamp information decoded from the received AV packet.
- Transmit and Receive FIFOs
The number of transmit and receive FIFOs depends on the number of isochronous channels supported on transmit and receive paths. For AV channels, AV specifications provide a guideline to determine the depth of the buffer. Buffer depth depends on the frequency differences between the application and the link. These buffers help smooth the AV packets at the transmitting end and de-jitter them at the receiving end.
Section 5 addresses the architectural options for building content protection over a 1394 AV Link core.
1394 AV Link Architectures With DTCP
There are various methods of implementing a DTCP encryption or decryption engine for an AV Link core. One common method is the Per-Channel M6 architecture. In such an architecture, an AV Link core has a separate DTCP hardware engine per AV channel, which means the number of DTCP hardware engines depends on the number of AV channels the core supports. Another common implementation method is the Shared M6 architecture. In this architecture, the DTCP hardware engine is shared among multiple AV channels, exploiting the advantages inherent in the 1394 network protocol.
Content protection with DTCP, as mentioned before, consists of four fundamental layers of protection, namely authentication and key exchange, content encryption, copy control information, and system renewability. Typically, all layers are implemented in software, except for content encryption, which is implemented in hardware in the M6 block. An encryption block with a single DTCP hardware engine is called an M6 Tx, and a decryption block with a single DTCP hardware engine is called an M6 Rx. A dual M6 block has two DTCP hardware engines, each capable of encryption and decryption by sharing hardware resources.
The main challenge in enhancing a 1394 AV Link core with DTCP is the time required for content encryption and decryption. To be specific, 64-bit data requires 10 stages to complete encryption and decryption. Adding to this challenge, the next 64 bits can only be encrypted and decrypted if an intermediate value, generated during the encryption and decryption of the previous 64 bits of data, is available (This intermediate value becomes available after five stages of the encryption and decryption of the previous 64 bits of data). Each stage has multiple 64-bit arithmetic operations and is usually completed in one clock pulse.
Per-Channel M6 Architecture
In this architecture, each AV channel has a dedicated M6 block. A separate DTCP engine is required for transmit and receive paths, because data transmitted into the Transmit buffers and data read from the Receive buffers is independent from 1394 network traffic and can therefore overlap in time. The Per-Channel architecture is shown below .
Figure 2: Per-Channel M6 Architecture
The Per-Channel M6 architecture adds M6 Tx and M6 Rx blocks to the basic 1394 AV Link architecture. The operational frequency of both types of M6 block is application-dependent, but is always higher than the application clock, because the M6 encryption and decryption engines require 10 stages to complete encryption and decryption. The M6 Tx block implements a hardware-based encryption engine to encrypt source packets, while the M6 Rx block implements source packet decryption.
Shared M6 Architecture
In the Shared M6 architecture, two DTCP engines are used, each of which can both encrypt and decrypt, and which share common hardware resources. This architecture exploits two main features of the 1394 network protocol:
- Half-Duplex data transaction mode: This means that at any given time, only the transmit or receive path is active—both paths cannot be active simultaneously. Because of this, the hardware resources required for encryption and decryption can be shared.
- Single channel active at any given time: Only one AV channel data packet is transmitted or received on the 1394 network at any given time. This feature enables multiple AV channels to share the DTCP engine .
The Shared M6 architecture is shown in Figure 3. Note the added Dual M6 block under the Shared M6 architecture. The Packetizer and De-Packetizer blocks are modified to control the data flow with the Dual M6 block. The Dual M6 block consists of two DTCP engines that perform cryptographic functions as defined in the DTCP standard. The Dual M6 block operates at a constant 49.152-MHz 1394 link clock frequency, and thus is independent of the application clock.
Figure 3: Shared M6 Architecture
In this architecture, the Dual M6 block must satisfy the 1394 AV link core’s data rate requirement. Data on the transmit path to, and on the receive path from, the 1394 Link block is 64 bits per 8 link clocks, 64 bits per 16 link clocks, or 64 bits per 32 link clocks at the S400, S200, or S100 bus frequencies, respectively.
Comparison
In this section, both architectures are compared with respect to various parameters. Only M6-related blocks are focused on throughout the comparison, under the assumption that the rest of the blocks remain the same in both architectures.
In Figure 4 and Figure 5, the X-axes represent the number of AV channels supported (one AV channel supported means one transmit and one receive AV channel), and the Y-axes represents the gate count factor (2 refers to 2x gate count, where x is the gate count for a single encryption engine without decryption, operating at lower frequency.)
Number of AV Channels
In the Per-Channel M6 architecture, the number of M6 engines required on the isochronous receive and isochronous transmit paths depends on the number of AV channels supported by the core. Logic related to the M6 may require design changes and modifications as required by the AV application. In the Shared M6 architecture, the hardware required for 5C is independent of the number of AV channels supported on isochronous transmit and receive paths. M6-related software-programmable registers multiply with the number of AV channels supported. This is true for both architectures.
The Per-Channel M6 architecture requires design changes and modification with the number of AV channels supported. As with the Shared M6 architecture, this design is independent of the number of AV channels supported.
Application Data Rate
In the Per-Channel M6 architecture, the M6 block’s operational frequency is not fixed but instead depends on the application’s data rate requirements. The higher the application data rate, the higher the frequency of the M6 block. Thus, M6 design must be flexible enough to address numerous different applications with different data rate requirements. This flexibility makes the M6 design more critical, due to the complex stages of arithmetic logic required for the encryption and decryption processes.
In the Shared M6 architecture, the dual M6 operating frequency is fixed to the link frequency (~50 MHz) and hence is independent of the application’s data rate.
In the Per-Channel M6 architecture, the frequency of the M6 block depends on the application data rate. As in the Shared M6 architecture, the frequency of the M6 block is independent of the application data rate.
Gate Count
Gate count in the Per-Channel M6 architecture depends on the following factors:
- The number of AV channels supported on the isochronous transmit and receive paths
- The application’s data rate requirement, which determines the M6 block’s operational frequency
Because the M6 block is timing-critical, the gate count increases with an increase in the M6 block’s operating frequency.
In the Shared M6 architecture, the gate count is fixed, since the operating frequency is the same as the link clock’s, which makes the gate count independent of the application’s data rate. Because the Dual M6 block is shared between the AV channels, the gate count does not depend on the number of AV channels supported on the isochronous transmit and receive paths.
In the Per-Channel M6 architecture, gate count depends on the application’s data rate and the number of supported AV channels. As in the Shared M6 architecture, gate count is independent of both the application’s data rate and the number of supported AV channels.
Complexity
In the Per-Channel M6 architecture, a single M6 block deals only with encryption or decryption, and is dedicated to a single AV channel, which makes the M6 data flow controller comparatively simpler than the M6 data controller in the Shared M6 architecture. Moreover, at the M6 block, the data is in a source packet form for which encryption and decryption is straightforward, because encryption and decryption of 64-bit data requires an intermediate key. This key is generated during encryption/decryption of the previous 64 bits of data. This dependency is true only within the source packet.
In the Shared M6 architecture, one Dual M6 block serves multiple channels, which results in more control logic than is required by the Per-Channel architecture. The Dual M6 block’s complexity is high due to the following factors:
- Linking the 1394 core, with its data rate requirements, directly to the 1394 serial bus means that the Dual M6 block must satisfy the 1394 core’s exact data requirements.
- Because a dual cipher is shared between transmit and receive paths, extra design care must be taken to avoid simultaneous Dual M6 block requests from both transmit and receive paths.
- At the Dual M6 block, data is no longer in source packet format. Therefore, the controller must be able to deal with isochronous 1394 packets and derive the source packet boundaries required for source packet encryption and decryption.
- The controller must be able to handle multiple AV packets from different AV channels.
The M6 data controller in Per-Channel architecture is thus simpler than it is in the Shared architecture.
Power Dissipation
In both architectures, the DTCP engine causes major power dissipation due to the following factors:
- Total application bandwidth use from the 125ìs frame window
- Gate count due to the DTCP engine
- DTCP engine operational frequency
In the Per-Channel architecture, because the gate count depends on the number of AV channels supported on isochronous transmit and receive paths, power dissipation is directly proportional to the number of supported AV channels. Because the application’s data rate determines the M6 block’s operational frequency, power dissipation is dependent on the application’s data rate. In the Shared M6 architecture, because the gate count is independent of the number of supported AV channels and the DTCP engine’s operating frequency is independent of the application data rate, power dissipation only depends on the total application bandwidth used from the 125-ìs frame window.
In the Per-Channel architecture, power dissipation depends on the total application bandwidth used from the 125-ìs frame window, the number of AV channels, and the application data rate. As in the Shared M6 architecture, power dissipation depends only on total application bandwidth used from the 125-ìs frame window.
1394 Network Features
The Shared M6 architecture exploits two main 1394 network features, namely Half-Duplex transaction mode and single-channel activity. In Half-Duplex mode, a channel can either transmit or receive at any given time. This allows the designer to share the M6 engine between the Tx and Rx channels. Single-channel activity, of course, means that only one channel can be active at any given time. Thus, even if a system has multiple channels, the designer can be assured that one and only one channel is active at any given time. This again allows the designer to share the M6 engine across all channels in the system.
The Shared M6 architecture uses basic 1394 features, specifically Half-Duplex transaction mode and single-channel at a time activity, while the Per- Channel M6 architecture does not.
Gate Count/Number of AV Channels Comparison at Lower Application Data Rates
In Figure 4, applications using the 1394 network have lower data rates, which means a lower M6 block operating frequency in the Per-Channel architecture. The Dual M6 block’s operating frequency is the same as the link clock’s.
As Figure 4 indicates, the Per-Channel M6 architecture provides a low gate count for applications with lower data rates and a single transmit and receive AV channel.
Figure 4: Gate Count Factor and Number of AV Channels Compared for Low Data Rate Applications
For applications requiring more than one AV channel, the Shared M6 architecture provides a lower gate count than a comparable Per-Channel M6 architecture.
Gate Count/Number of AV Channels Comparison at Higher Application Data Rates
In Figure 5, applications using the 1394 network have higher data rates, which mean a higher M6 block operating frequency in Per-Channel architecture. The Dual M6 block’s operating frequency is same as the link clock’s.
Figure 5: Gate Count Factor and Number of AV Channels Compared for High Data Rate Applications
As Figure 5 indicates, the Shared M6 architecture provides a low gate count for higher application data rates.
In the Shared M6 architecture, the gate count is independent of the number of AV channels supported.
Conclusion
Table 1 provides a summary comparison of both architectures.
Table 1: Per-Channel and Shared M6 Architecture Comparison
Item | Per-Channel Architecture | Shared M6 Architecture |
Number of AV channels | Gate count dependent on number of AV channels. Design changes and modifications may be required | Gate count independent of number of AV channels. No design changes are required. |
Operating frequency | Dependent on application’s data rate | Fixed at link clock rate; independent of application’s data rate |
Gate count | Dependent on the number of AV channels and the application data rate. | Fixed gate count for Dual M6 block, independent of the number of AV channels and the application’s data rate |
Complexity | Simple M6 data controller | More complex M6 data controller |
Software programmable registers | The number of registers depends on the number of AV channels supported. | The number of registers depends on the number of AV channels supported |
Application dependency | Dependent | Independent |
Timing criticality | Dependent on the application’s data rate | Must be met at the link frequency |
Half-duplex data transaction | Does not use this property | Uses this property to share hardware resources between the encryption and decryption engines |
Single channel is active at any given time | Does not uses this property | Uses this property to share the Dual M6 block between all AV channels |
Power dissipation | Dependent on application data rate, total application bandwidth utilization from 125 ìs frame window, and number of AV channel supported | Dependent on total application bandwidth utilization from 125 ìs frame window. |
When to choose? | When the end application is known and requires a lower data rate and a single channel. | When the end applications and data rate requirements are not known |
This paper has presented AV link architecture without DTCP as a basis to describe the addition of the Shared M6 and Per-Channel M6 architectures for the incorporation of content protection using DTCP. Analysis of the comparison table indicates that the Shared M6 architecture is best suited to multiple types of end application, while the Per- Channel architecture is best suited to known end applications requiring a lower data rate and only a single channel. For multiple AV channels, the Shared M6 architecture offers a lower gate than the Per-Channel M6 architecture.
To conclude, because most applications such as set-top boxes and storage devices require multiple AV channel support and are required to support a wide range of end applications, the Shared M6 architecture becomes the preferred architecture from an IP development perspective.
References:
[1] IEEE 1394-1995: IEEE Standard for a High Performance Serial Bus-Description, Institute of Electrical and Electronics Engineers, Inc.
[2] IEEE 1394-2000: IEEE Standard for a High Performance Serial Bus (Supplement), Institute of Electrical and Electronics Engineers, Inc.
[3] IEC 61883: Consumer audio/video equipment – Digital interface, Second edition, International Electrotechnical Commission
[4] Digital Transmission Content Protection Specification, Volumes 1 and 2 : Revision 1.4, Hitachi, Ltd., Intel Corporation, Matsushita Electric Industrial Co., Ltd., Sony Corporation, Toshiba Corporation
[5] FireWire System Architecture: IEEE 1394A (2nd Edition), Don Anderson, Mindshare Inc., publisher.
|
Synopsys, Inc. Hot IP
Related Articles
New Articles
- Accelerating RISC-V development with Tessent UltraSight-V
- Automotive Ethernet Security Using MACsec
- What is JESD204C? A quick glance at the standard
- Optimizing Power Efficiency in SOC with PVT Sensor-Assisted DVFS Technology
- Bandgap Reference (BGR) Circuit Design and Transient Analysis in 90nm VLSI Technology
Most Popular
- System Verilog Assertions Simplified
- Accelerating RISC-V development with Tessent UltraSight-V
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- Understanding Logic Equivalence Check (LEC) Flow and Its Challenges and Proposed Solution
- Design Rule Checks (DRC) - A Practical View for 28nm Technology
E-mail This Article | Printer-Friendly Page |