|
||||||||||
Distributed Video Coding: Adaptive Video SplitterVijay Kumar Kodavalla (Wipro Technologies), Dr. P.G. Krishna Mohan (JNTU College of Engineering) Abstract: Distributed Video Coding (DVC) is next generation video coding paradigm for emerging applications such as wireless video cameras and wireless lowpower surveillance networks, disposable video cameras, medical applications, sensor networks, multi-view image acquisition, networked camcorders, etc. where low complexity encoders are a must because memory, computational power, and energy are scarce. One of the key elements in encoder is Adaptive video splitter (AVS), which shall adaptively split (varying Group of Pictures (GOP)) the incoming video into Key frames and Wyner-Ziv (WZ) frames based on relative motion, which will be encoded using H.264 intra coder or WZ coder respectively. Larger GOP size is preferred for lower data rates, but will demand frame buffer in the encoder. For lower encoder complexity, video splitter shall not depend on complex motion estimation methods, but should use motion activity metrics based on histograms. Also the overall Rate-Distortion (RD) performance of DVC is either better or comparable with that of H.264 intra code, for low to high motion video sequences[1]. Whereas for significantly higher motion activity video sequences, RD performance of DVC is lower than that of H.264 intra coder[1]. In this paper, an adaptive video splitter (AVS) design and implementation details, which can also improve RD performance with significantly higher motion video sequences, are presented. This paper is backed up with experience of developing entire DVC codec C model, which is presented in authors other submitted paper[1]. 1. Introduction The block diagram of DVC codec[1] we have implemented is shown in Figure 1 including the Adaptive video splitter. Adaptive video splitter, adaptively splits the incoming video into Key frames and Wyner-Ziv (WZ) frames based on motion, which will be encoded using H.264 intra coder or WZ coder respectively. A GOP is Group of Pictures, where first frame is a Key frame and remaining frames are WZ frames. Though it is preferred using longer GOP’s when there is more temporal redundancy (the amount of motion is low) and shorter GOP’s when there is less temporal redundancy (the amount of motion is high), it would demand storing frames at the encoder in frame buffer. However fixed GOP will eliminate need for frame buffer at the encoder, which is advantageous for practical applications. Also adaptive video splitter added to the encoder shall not significantly increase its complexity, i.e. shall not perform any motion estimation or complex mode decision. Apart from this, we propose that AVS shall also perform decision on when and which frames shall be encoded only using H.264 intra coder by bypassing the WZ path. This is necessary because RD performance of DVC is lower than that of H.264 intra coder, for significantly high motion sequences. In such sequences, it is advantageous if AVS detects significant high motion frames and bypasses WZ path, so that DVC will work as H.264 intra coder for those significant high motion frames in a sequence, as described in authors another paper, Reference [2]. As opposed to complex motion estimation process in conventional video encoders, AVS uses a low complex technique to decide the GOP size by measuring the motion activity along the video sequence. This low complex AVS mechanism can be divided into three parts:
The Section 2 highlights design details of AVS. The Section 3 presents results of AVS implemented, followed by conclusions in Section 4. Figure 1. DVC Encoder and Decoder Architecture 2. AVS design details 2.1 Motion activity metrics The simple yet powerful metrics[3][4] are used to evaluate the motion activity along the video sequence. The motion can be characterized typically as global motion or local motion. The examples of global motion are panning, zooming and scene changes, whereas example of local motion is object motion in static background. For characterizing global motion, frame level metrics are needed; whereas for characterizing local motion, block level metrics are needed. The following four metrics are used in which first two will characterize global motion, whereas last two will characterize local motion:
2.1.1 Difference of Histograms (DH) The distance measure between fi and fj is defined by the absolute sum of their histogram difference as follows[3]: Where L is the number of levels, i and j are the frame indexes hi and hj are histograms (Luminance only) for frames fi, fj and Df is the frame size. The best performance is achieved for L=32. Basically luminance histogram is a very efficient index for image content. The difference of histogram between two frames will be good measure of correlation of their content. Also DH metric is insensitive to local motion activities (moment of object in a fixed background), and sensitive to global motion activities such as panning, zooming and scene changes. 2.1.2 Histogram of difference (HD) The histogram of difference between two images, i.e., the histogram of fi-fj, is denoted by hi-j. The distance measure is defined by[3]: Where α is a threshold for determining the closeness of the position to zero. (Consider L = 64, α=32) This HD metric has different characteristics than that of DH. Unlike DH, HD is much more sensitive to local motion. 2.1.3 Block histogram difference (BHD) The block histogram difference is same as DH, except that instead of taking histogram at frame level, it is taken at block level. In BHD, histogram of each block will be computed and absolute differences of these block histograms between two frames will be summed. The BHD is defined as[3]: Where DB denotes block size. (Consider DB as 8x8, L (number of histogram levels) =8 and DF/DB gives the total number of blocks in the current frame) 2.1.4 Block variance difference (BVD) The BVD is same as BHD, except using variance of histograms instead of histogram. The distance is defined by the sum of the absolute differences of block histogram of variance between two frames, which is defined as [3]: Where σ2 (b, k) is the variance of histograms having k levels. (Consider DB as 8x8 and L=8). Like the BHD, BVD is made sensitive to local motion activities by computing the block histogram differences between two blocks. 2.2 GOP length decision GOP length decision, for maximum GOP size of M frames[3] is given by:
The φ is a threshold value and we propose to consider it as four to six based on experimentation. The maximum GOP size M is constrained by available memory and delay requirements, since it is necessary to store M frames at the encoder to perform the GOP length decision. 2.3 Bypass WZ path As explained, it is advantageous to bypass WZ path and encode using H.264 intra coder only, during high motion activity frames in a sequence. This way we don’t have to compromise RD performance of DVC with high motion frames in a sequence, and it can be brought as close to as that of H.264 intra coder. We propose to use same motion activity metrics described as in Section 2.1, for taking decision on bypassing WZ path for high motion activity frames in a sequence as well. The GOP length decision shown in Section 2.2 can be extended as follows: 1. At Step-3, additionally take difference in the motion of each set of frames: 2. After determining GOP size in Step-7, do an additional step to check any of the Zi values calculated at Step-3, exceeds a pre-defined threshold say φ1. If any of the Zi values exceeds the pre-defined threshold (which means significant motion activity), frames in calculated GOP length shall be encoded using H.264 intra coder and WZ coder shall be bypassed for such frames We propose to use φ1 (the threshold value) as 0.3 to 0.5 based on experimentation. 3. AVS implementation Results To validate implemented AVS, various motion activity video sequences are considered, such as QCIF hall monitor (low to medium motion), coast guard (medium to high motion), foreman (high motion) and soccer (significantly high motion) sequences with 15 Hz vertical frequency. The GOP length calculated for the first 52 frames of each said sequence is given in Table 1, where M is taken as four. The threshold φ chosen here is four. Table 1. GOP sizes of various sequences
The motion activity between adjacent frames (xi and xi-1) is shown in Figures 2 to 5. And the differential motion activity (zi) is shown in Figures 6 to 9.
From Figures 2 to 5, it can be seen that temporal motion activity is lowest in Hall monitor, moderate in Coast Guard and where as high to significantly high in Foreman and Soccer sequences. This can be seen by considering the magnitude of fluctuations in the motion activity. Figures 6 to 9 shows the differential motion activity among adjacent frames. If we consider threshold value φ1 as say 0.4. If we compare with threshold φ1, it can be found that no GOPs will meet this threshold and hence there will not be any WZ bypass scenarios in Hall monitor sequence. Whereas for Coast Guard, Foreman and Soccer sequences, around 5-20 pairs of frames are crossing the threshold (for example frame numbers 3, 4, 17, 49, 81, 97, 100, 113, 114 and 145 crossed the predetermined threshold). Hence GOPs corresponding to those pairs of frames shall be H.264 intra only coded by bypassing WZ. This way, we can exploit the fact that H.264 intra RD performance is better than that of DVC for significantly high motion frames in video sequences. Hence wherever there is significant motion in frames of a sequence, we detect and bypass WZ path by H.264 intra only coding. This way we can increase the overall RD performance. 4. Conclusion Adaptive Video splitting (AVS) design and implementation details are presented. A new method to bypass WZ path for significantly high motion frames in video sequence is illustrated, which will improve overall RD performance. 5. References [1] Vijay Kumar Kodavalla, “Distributed Video Coding: Codec Architecture and Implementation”, SPPRA2011, Innsbruck, Austria [Submitted] [2] Vijay Kumar Kodavalla, “Distributed Video Coding: Challenges in Implementation and Practical Usage”, IP-SOC2010, Grenoble, France [3] J. Ascenso, C. Brites and F. Pereira, “Content Adaptive Wyner – Ziv Video Coding Driven by Motion Activity”, IEEE International Conference on Image Processing, Atlanta, USA, October 2006. [4] Jungwoo Lee, and Bradley W. Dickinson, “Temporally Adaptive Motion Interpolation Exploiting Temporal Masking in Visual Perception”, IEEE Transactions on image processing, vol. 3, no. 5, september1994.
|
Home | Feedback | Register | Site Map |
All material on this site Copyright © 2017 Design And Reuse S.A. All rights reserved. |