A High Efficiency Referencing Frame Buffer Compression IP for H.265 Video Codec
Beyyun Kuo, Jacques Baudia, Star Sung (TITC)
Abstract :
A 3.0X times image compression method and fast storage device accessing H.265 referencing image frame is achieved by applying fixed bit rate to reduce each “Block of pixels” data of each image frame. Several thresholds are quality predetermined depending on the availability of the bandwidth of the storage device and the image resolution to decide the compression ratio of each image frame. Starting address of each compressed “Block of pixels” is saved in predetermined location of the storage device for quick random accessing any compressed frame of image. A smart engine detects the Tx/Rx bandwidth availability and decides the compression ratio to avoid data congestion. The high image quality is achieved by applying patented proprietary compression algorithms including accurate prediction in DPCM, a new VLC coding with accurate predictive divider and an intelligent bit rate distribution control and an intelligent random truncation mechanism is realized to avoid artifact caused from error propagation. From most test bench video sequences, the H.265 Video Codec can reach less than 0.1dB PSNR drop in continuous 29 P-type frames and still less than 2.0dB drop in 190 continuous P-type frames making it visually artifact free.
1. INTRODUCTION
There are essentially three types of picture encoding in the video compression standard. I-frame, the “Intra-coded” picture uses the block of 8x8 pixels within the frame to code itself. P-frame, the “Predictive” frame uses previous I-type or P-type frame as a reference to code the difference. B-frame, the “Bi-directional” interpolated frame uses previous I-frame or P-frame as well as the next I-frame or P-frame as references to code the pixel information. In principle, in the I-frame encoding, all “Block” with 8x8 pixels go through the same compression procedure that is similar to JPEG, the still image compression algorithm including the DCT, quantization and a VLC, the variable length encoding. While, the P-frame and B-frame have to code the difference between a target frame and the reference frames as shown in Fig. 1
Fig. 1
The reconstructed referencing frames are saved in the frame buffers as shown in Fig. 1 which can be either off-chip DRAM or on-chip SRAM. If DRAM, the bandwidth limitation can be overcome by applying a compression engine to reduce the data amount to be transmit and receive from the DRAM referencing F.B. If on-chip SRAM is selected as depicted in Fig. 2, the die area will be a heavy load and costly, most like costs more than 2/3 of the die area , and a compression engine can easily reduce the die area hence cut down the wafer cost.
Fig. 2
Compression results in more or less image quality degradation unless the pure lossless algorithm which does not guaranty a minimum ratio of data reduction which is critical in determining the amount of the on-chip frame buffer SRAM or an off-chip DRAM. TITC has successfully developed compression algorithms and the corresponding VLSI macro-IP with fixed data rate or said compression ratio at near lossless image quality which from all tested still images shown visually lossless. In the past 8 years since mass production from Germany Micronas, then, world top TV decoder chip suppliers, TITC image compression IPs have been implemented in more than 150M TV and 400M smart phones without any complain on the image quality or power consumption.
1.1 INNOVATIVE COMPRESSION ALGORITHMS RESULTS IN HIGH IMAGE QUALITY
A couple of compression algorithms have played the keys to such a good image quality, visually and in PSNR, including:
- Accurate prediction in DPCM making shorter code possible.
- Golomb-Rice-like VLC coding: accurate predictive divider resulting in shorter code of Quotient. This VLC coding gains 2-3 dB better quality than the for the Golomb-Rice coding.
- Intelligent bit rate distribution: with accurate prediction, according to line to line correlation, an intelligent means of more accurately assigning bit rate to compress each Block of pixels.
Comparing to the international lossless compression standard called JPEG-Lossless, or said JPEG-LS, the TITC-LS gains higher compression ratio as shown in Fig. 3 with much less hardware (gate count) and feasibility of scaling up to pursuing higher throughput in both compression and decompression.
Fig. 3
Lena and Pepper are well known and commonly used in research community. While “DeskTop” is very common and real display screen. WordRG has complex pattern of Green back ground with pink text in upper side and pink background green text which is believed to test worst case in Chroma. DogCats has 2/3 area of animal fur for high frequency testing in compression ratio. The average of the five tested images of the JPEG-LS is 2.35X, while the TITC-LS reaches 2.43X compression ratio under lossless quality as shown in Table 1.
Table 1
Tested images | JPEG-LS | TITC-LS | |||
Original | Reduced | Ratio | Reduced | Ratio | |
Lena | 193 KB | 102 KB | 1.89 | 107 KB | 1.80 |
DogCats | 771 KB | 480 KB | 1.61 | 384 KB | 2.01 |
WordRG | 608 KB | 139 KB | 4.37 | 144 KB | 4.21 |
Pepper | 193 KB | 121 KB | 1.87 | 107 KB | 1.80 |
Mobile | 193 KB | 122 KB | 2.03 | 82 KB | 2.35 |
Average | From “Ratio” right column | 2.35 | 2.43 |
1.2 TITC-LS DERIVATIVE COMPRESSION ALGORITHMS WITH FIXED COMPRESSION RATIO
Based on the efficient lossless compression principles, a couple of innovative algorithms of predict in ting divider value and bit rate distribution are applied to make the TITC fixed ratio compression mechanism reach high image quality. In implementing the compression codec with specific fixed bit rate, TITC codec receives the input pixels and compresses segment by segment with variable bit rate assigned to each segment based on the prediction to optimize the quality.
2. IMAGE QUALITY COMPARISON
Fig. 4 shows the Motion Estimator within the video compression engine which searches for the best matched Macro-Block, MB from the predetermined searching range buffer and forward it to calculate the difference and block-by-block compresses the difference through the compression procedure as DCT, Discrete Cosine, Quantization and VLC, Variable Length Code, Coding. The pixels temporarily saved in the Searching range buffer are compressed data which is accessed through an external DRAM or an on-chip SARM FB, Frame Buffer. To assign a fixed time of accessing the external DRAM, a fixed compression ratio of each “Block” of pixel data is preferred. Since it requires fixed compression ratio, no guaranty a Lossless image quality is used in compressing the Block of image. Since the referencing frames will be used as reference for future frames till the next I-frame shows up, the error from compression can be propagated and accumulated to next frames forever. Therefore, higher image quality in compressing the reference frame buffer is highly required to avoid artifact of image.
Fig. 4
3 INNOVATIVE COMPRESSION ALGORITHMS RESULTING IN HIGH IMAGE QUALITY
A couple of compression algorithms within TITC reference memory buffer compression have played the keys to such a good image quality, visually and in PSNR, including:
- Accurate prediction in DPCM as well as accurately stops the DPCM coding with prediction. The former making shorter code possible, while the later minimizing the code length to original code length. The later gains ~ 2dB.
- Golomb-Rice-like VLC coding: accurate predictive divider resulting in shorter code of Quotient. This VLC coding gains 2-3 dB better quality than the for the Golomb-Rice coding.
- Intelligent bit rate distribution: with accurate prediction, according to line to line correlation, an intelligent means of assigning bit rate to compress each segment of pixels reaches ~ 3dB increase in image quality. TITC-LS derivative compression algorithms with fixed compression ratio.
Based on the efficient lossless compression principles, a couple of innovative algorithms of predicting divider value and bit rate distribution are applied to make the TITC fixed ratio compression mechanism reach high image quality. In implementing the compression codec with specific fixed bit rate, TITC codec receives the input pixels and compresses block by block with variable bit rate assigned to each block based on the prediction to optimize the quality of a whole Macroblock of pixels.
4 SIMULATION RESULT OF H.264/265 VIDEO SEQUENCES
In 2005 and 2007 there were 2 big clients of TITC whose internal image compression IP used to compress the same video sequence (H.264 format) of famous test bench called “Foreman” which every 29 continuous P-frames inserted an I-frame.
Fig.5 Image quality simulation
Simulations shown sharp image quality degradation of dropping 8-10 dB PSNR every 30 accumulative frames. While TITC IP applied in the same test bench with only < 1.8dB PSNR as shown in Fig. 5. Same to that old test video sequences, in H.265 Video Compression, with 7-8 years of TITC team’s continuous effort in image quality improvement, accumulative image quality drop has been reduced from 1.8dB drop down to 0.3dB making the H.265 image visual quality perfect.
Fig. 6 discloses a world top DVD chip supplier’s image quality test result of an H.264 Foreman video sequence. Red line shows only a minor drop of 2dB from TITC 8x8 Block fixed 2.5x times ratio IP while the Green line shows an PSNR drop of ~10 dB making us believe that 8x8 Block is a perfect shape for reference memory block compression of the H.265 video compression.
Fig. 6
5 VLSI SPECIFICATIONS OF THE COMRPESSION CODEC
In VLSI implementation, to reach throughput of 4 pixels per clock or a fixed time, the TITC 8x8 3.0x times Block Reference Frame memory compression IP needs 50K logic plus ~3K bits Flip Flops, multiple IPs of buffered equivalent 350K gates can reach 4X time throughput making it take only 4 clocks to compress a block of 8x8 Block of pixels or 16 clocks for an MB pixels. In TSMC 65nm G-Process this IP runs 500 MHz. TITC has since 2005 selling this Block compression IP to German Micronas, Japanese Renesas and 6-7 months, their TV entered mass production. In 2013, Panasonic TV adopted TITC image compression IPs into their 4K TV with superior image nd video quality. From 2012 Novatek,, Orise and 2013 Korean Magnachips, 2014 Himax licensed TITC image compression IP into their display driver for smart phone Full-HD resolution panel. Multiple compression ratios are determined and 3 control register bits to select the targeted compression ratio which can match selected memory density and bandwidth efficiency.
ACKNOWLEDGEMENT
Authors like to express their appreciation to R&D teams of TITC, Novatek, Orise, Himax and Magnachips as well as Mr. Gilles Ries, a design manager at ST for their excellent and hard work on the system design and FPGA porting. Dr. Fritz Lebowsky has in the past years provided lots of expertise in image quality enhancement which plays a critical key for the sharp image quality of this work.
REFERENCES
REF.1 Lina J.Karam: LosslessCoding; Handbook of Image & Video Processing (2000), pp. 461 – 474 Academic Press
|
Related Articles
- The VP8 video codec: High compression + low complexity
- Display Driver with on-chip frame buffer and a scalable image compression engine
- H.264 High Profile: Codec for Broadcast & Professional Video Application
- VESA Video Compression on MIPI DSI-2 Enables Next-Generation Display Applications
- Lossless Compression Efficiency of JPEG-LS, PNG, QOI and JPEG2000: A Comparative Study
New Articles
- Quantum Readiness Considerations for Suppliers and Manufacturers
- A Rad Hard ASIC Design Approach: Triple Modular Redundancy (TMR)
- Early Interactive Short Isolation for Faster SoC Verification
- The Ideal Crypto Coprocessor with Root of Trust to Support Customer Complete Full Chip Evaluation: PUFcc gained SESIP and PSA Certified™ Level 3 RoT Component Certification
- Advanced Packaging and Chiplets Can Be for Everyone
Most Popular
- System Verilog Assertions Simplified
- System Verilog Macro: A Powerful Feature for Design Verification Projects
- UPF Constraint coding for SoC - A Case Study
- Dynamic Memory Allocation and Fragmentation in C and C++
- Enhancing VLSI Design Efficiency: Tackling Congestion and Shorts with Practical Approaches and PnR Tool (ICC2)
E-mail This Article | Printer-Friendly Page |