Simple drift error-resilient H.264/AVC encoder for fast video transcoding using DCT coefficients

Abstract
As described in this paper, an encoder that generates drift-error-resilient coded bitstreams is proposed. The proposed drift-error-resilient bitstreams are generated using the TotalCoeff or the total amount of non-zero DCT coefficients to estimate the drift-error distortion. This estimated drift error distortion is then examined in the motion estimation process. Results show that our proposed method can decrease drift error effectively compared to the original encoder and that it can achieve comparable PSNR to a conventional drift-error-resilient encoder. The main advantage of the proposed method compared to a conventional drift-error-resilient encoder is that, comparable PSNR performance, lower complexity and memory usage can be achieved without implementing an open-loop transcoder at the encoder feedback loop.

Keywords
Drift error-resilient encoder; DCT coefficients; H.264/AVC
1. Introduction
Concomitantly with the rapid development of consumer electronics devices such as widescreen television sets, set top boxes, and smartphones, demand for real-time transmission of interactive multimedia has increased. To offer high efficiency real-time transmission, various researches on network securities, network coding, video coding, etc. have been carried out [1]; [2] ;  [3]. Aside from that, various networks that might have different characteristics have been deployed as well. The characteristics of these networks, such as available bandwidth and allowable bitrate change dynamically. To enable interconnectivity among these heterogeneous networks and various client devices which might have different specifications, a variable bitrate streaming system is necessary. The key factor for system of this type is video transcoding [4]. Several earlier reports [5] ;  [6] have provided overviews of various commonly used transcoders.

Exhibits a streaming system that supports video clients in real time. Transcoding processes will take place at the gateway server or the streaming server. Communication between the gateway server and clients takes place in real time. Moreover, it is assumed that the transmission between video storage and the gateway server uses a high-speed channel and that the error transmission is negligible.

Real-time variable bitrate streaming system.
Real-time variable bitrate streaming system.
According to network and client conditions such as the available bandwidth and allowable frame rate, appropriate bitrate reduction is done on the original video through video transcoding process before transmitting the video to the client.

The method is intended for real-time applications. Therefore, the computation time at the transcoder must be as short as possible. A simple requantization transcoder called an open-loop transcoder is useful because it has low complexity [4]. However, it cannot be imposed as it is in H.264/AVC because of the high occurrence of various signal prediction errors [7]. These errors are known as drift error. High occurrence of drift error can markedly degrade the quality of video sent to the client. For that reason, open-loop requantization transcoders are usually not recommended. To compensate for drift error, a feedback-loop can be attached to transcoders for error compensation but the computation time at the transcoder will be high. It is therefore unsuitable for real-time applications.

A short computation time transcoder that is suitable for real-time applications such as open-loop transcoder and which can produce acceptable video quality might be implemented if drift-error-resilient bitstreams could be generated at the encoder side [8]. In this paper, an encoder that generates drift-error-resilient bitstreams is proposed. This generation of bitstreams is achieved using the total amount of non-zero coefficients or TotalCoeff within the blocks to estimate drift error distortion, which is the amount of drift error propagating to future frames.

The remainder of the paper is organized as follows. In Section 2, drift error propagation within inter-frames is explained briefly. In Section 3, related research in overcoming drift error is presented. For Section 4, an overview of the proposed method with a block diagram of proposed encoder is given. Details of the method for generating drift-error-resilient bitstreams using TotalCoeff are explained in Section 5. Comparison of the performance between the proposed and conventional encoders was conducted as described in Section 6. Lastly, concluding remarks are given in Section 7.

2. Drift error propagation
Drift error can result from clipping, rounding operations, and requantization. Error caused by clipping and rounding is inconsequential compared to that caused by requantization. Therefore, only drift error caused by requantization is explained. In H.264/AVC, previously reconstructed frames are used as a reference to encode the current frame. At the transcoder side, requantization is applied to the encoded video to satisfy constraints impose by heterogeneous network and client. If drift error compensation is not done, then the reference data of the encoded video will be corrupted. At the decoder side, these corrupted data will be used to reconstruct frames, thereby causing distortion in reconstructed frames. When these distorted reconstructed frames are used to reconstruct future frames, some distortion or error in the reference frames will then propagate to future frames. This phenomenon is designated as drift error propagation. Portrays drift error propagation within a group of pictures (GOP) at the decoder side. Assuming that only the 1st inter-frame is requantized, some error caused by requantization will propagate to a 2nd inter-frame when a current macroblock (MB) in the 2nd frame uses MBs in the 1st frame as a reference for reconstruction. Because of this process, the 2nd frame is distorted. When the 3rd frame uses the 2nd frame, which is distorted for reconstruction, some errors in the 2nd frame will propagate to the 3rd frame. These errors will continue to propagate and accumulate throughout the inter-frames until an instantaneous decoding refresh (IDR) frame is inserted.

Drift error propagation within a GOP [8].
Drift error propagation within a GOP [8].
3. Related work
To compensate for drift error, various drift error compensation transcoders that function by attaching a feedback-loop have been proposed. Earlier studies [9] ;  [10] have evaluated the performance of some existing drift error compensation transcoders in spatial and transform domains. Notebaert et al. proposed two types of feedback-loop requantization transcoder in which compensation is done in either the transform or pixel domain [7]. In an earlier report [10], Lefol et al. proposed a mixed requantization architecture transcoder (MRA) that is the combination of fast pixel-domain transcoder (FPDT) and cascaded pixel domain transcoder (CPDT) architectures where CPDT is used for intra-frames and FPDT for inter-frames. However, in [11], it is said that MRA produced unreliable results attributable to the absence of spatial compensation for intra-coded MBs in P and B frames. Consequently, the authors proposed combinations of individual spatial and temporal compensation approaches and apply different techniques according to MBs of the type. Combinations of CPDT, spatial compensation and temporal architectures or open-loop are considered.

All of the described transcoders yield better results in terms of PSNR compared to the open-loop transcoder. They also have lower processing times than the full decoder–reencoder transcoder. However, an earlier report [11] describes that for the processing speed in terms of frame per second (fps), open-loop is the highest among all the proposed and existing transcoders. Another report [4] describes that open-loop transcoders meet the requirement for real-time application. The processing speed between open-loop and the other evaluated transcoders can have a difference of at least 1.2 times [11]. Because open-loop transcoders have a very simple architecture, they also have the advantages of lower power consumption, a smaller footprint on the chip area, and lower memory bandwidth.

Based on the advantages that open-loop can offer, it is a suitable transcoder for use with real-time applications. If drift-error-resilient bitstreams can be generated at the encoder side, then the open-loop transcoder can be implemented.

4. Low-complexity drift-error-resilient bitstreams using DCT coefficients
To generate a drift-error-resilient bitstream encoder, the amount of drift error propagating to future frames or distortion caused by drift error at the transcoder side must be estimated.

4.1. Drift error resilient encoder using virtual open-loop transcoder [12]

Zhang et al. proposed a drift-error-resilient encoder by incorporating a virtual open-loop transcoder to estimate distortion caused by drift error [12]. shows the encoder architecture.

Zhang’s drift-error-resilient encoder with a virtual transcoder [12].
Zhang’s drift-error-resilient encoder with a virtual transcoder [12].
To estimate the drift error distortion, a certain value of requantization parameter, QPvt, is assumed at the virtual transcoder. To reduce drift error distortion at the decoder side, the encoder will consider estimated drift error distortion in the motion estimation (ME) process. Blocks that introduce the least drift error distortion will be selected as the best block or MB as the reference for current MB. In addition to the quantization distortion which is the difference between source and encoded video, drift error distortion is examined as well in the ME process.

One disadvantage of this proposal is that, in an actual case, the requantization parameter that will be used in a transcoder is unknown at the encoder side. Therefore, this assumption might not be practical. Because distortion attributable to drift error is estimated in the pixel domain, additional processes such as inverse quantization and inverse transform are necessary. These increase the computation time in the encoder. Last but not least, high memory usage is necessary to store requantized frames.

4.2. Proposed encoder using DCT coefficients

Estimation is done by taking the total amount of non-zero DCT coefficients or TotalCoeff within a block after quantization in the encoder. If the TotalCoeff within a block is low, then a high probability exists that the number of zeros within the block is high. In the open-loop transcoder, requantization is done onto the DCT coefficients. Therefore, if the value of the DCT coefficient in the encoder side is zero, then the value of the DCT coefficient remains zero even after requantization. Consequently, it does not contribute any drift error distortion. Fig. 4 presents the proposed method by which the dotted line in Frame 0 represents the search region, whereas A and B show two different blocks that are currently being considered as references. For example, if the TotalCoeff in A is lower than B,A will be selected as the reference for current MB instead of B.

Condition for selecting a reference block.
Condition for selecting a reference block.
Presents the block diagram of the proposed encoder. Blocks in bold lines in represent additional processes added for estimating drift error distortion. After the quantization process in the forward loop, the TotalCoeff in each block is determined and will be saved into a buffer so that estimation of drift error distortion can be conducted for future frames.

Block diagram of proposed encoder.
Block diagram of proposed encoder.
At the drift error estimator, drift error distortion propagating to the current MB is estimated using motion vector (MV) information and the TotalCoeff of the previous encoded frames. Aside from quantization distortion, our proposed method also considers drift error distortion. Consequently, at the ME process, the blocks that introduce least drift error distortion and quantization distortion will be chosen as reference for the current MB.

By comparing the architecture of the conventional encoder in with that of the proposed encoder in, it is readily apparent that the proposed encoder has simpler architecture. Additional processes such as inverse transform and inverse quantization which are necessary to reconstruct distorted frame in Zhang’s encoder are eliminated. Consequently, the computation time of the proposed encoder will be lower than Zhang’s encoder. Computation time evaluation is presented in Section 6.

In the next section, estimation of drift error distortion using DCT coefficients is explained.

5. Derivation of overlap region parameter, ω and estimation of drift error distortion
The amount of drift error propagating to the current MB depends on the overlap region, ω of the current MB onto the reference blocks. Consequently, the ω must be determined. As described herein, the mode for each MB is regarded as 16×16. An MB can be further divided into blocks. A block consists of 4 (pels)×4 (lines). Consequently, a 16×16 MB has 16 blocks.

Presents the concept of determining ω of a current MB onto a block in reference frame. Bold lines in tth frame represent the MB boundaries. The gray area is the overlap region.

Block-based overlap region.
Block-based overlap region.
Assume t=0 and kth MB in the (t+1)th frame is the current MB. Taking a block which is shaded with diagonal lines in as an example, the ω can be written as

equation(1)
View the MathML source
Turn MathJax on

where View the MathML source is the motion vector, i is the overlap block index, and x,y is the block coordinate. In this case, the block coordinate is (3, 2).
The values of ω for blocks that are shaded with vertical lines are determined as well. Amounts of drift error or drift error distortion propagated to kth MB in (t+1)th frame because of i can be written as shown below.

equation(2)
View the MathML source
Turn MathJax on

Drift error will accumulate from frame to frame. Therefore, in the next (t+2)th frame, View the MathML source can be written as presented below.
equation(3)
View the MathML source
Turn MathJax on

If an IDR frame is inserted at time t=n, then the total drift error distortion from t=(t+1) to  View the MathML source can be written as

equation(4)
View the MathML source
Turn MathJax on

Consequently, the total of drift error propagation of a frame consists not only of drift error from the immediate previous frame but also of the drift error from all previous frames, which is designated as accumulated error.

With View the MathML source as the total number of overlap blocks, the total drift error distortion from a previous frame, tth frame, to kth MB in immediate future frame, (t+1)th can be written as presented below.

equation(5)
View the MathML source
Turn MathJax on

With View the MathML source as the total number of MB within a frame, total drift error distortion from a previous frame, tth frame, to the immediate future frame, (t+1)th, is

equation(6)
View the MathML source
Turn MathJax on

The conventional distortion formula is defined as

equation(7)
D(s,c(mv),t)=Ds(s,c(mv),t),
Turn MathJax on

where View the MathML source denotes the sum of absolute difference (SAD) between the original picture s and the coded picture c; mv is the motion vector.
To reduce drift error propagation at the transcoder side, the estimated drift error distortion is considered at the encoder side by inserting an additional drift distortion term, View the MathML source to the conventional distortion formula. Consequently, the distortion term for a frame can be written as shown below.

equation(8)
View the MathML source
Turn MathJax on

Therein, View the MathML source represents the estimated drift error distortion; η is the normalization parameter, which is defined as η>0. The normalization parameter is necessary because TotalCoeff within a block is much less than the amount of real drift error within a block.
6. Simulation results and evaluations
For performance evaluations, the proposed method is compared to Zhang’s encoder [12] and an original x264 encoder with version 0.67.x. Both Zhang’s method and the proposed method are implemented onto an original x264 encoder with version 0.67.x. For use as a transcoder, we modified a T264 encoder/decoder [13] by incorporating an open-loop architecture transcoder.

Throughout the simulation three video sequence, Foreman, Flower and Container were used with the following conditions: 4:2:0 sampling and 352 (pel)×288 (line), a GOP of 35 frames where all are P pictures except for the 1st frame being an IDR frame, quantization parameter, QP1 is 24, all MB in P-frames are forced to 16×16 mode, and deblocking filter is switched off. Because the drift error caused by the intra-frame is not our major interest, the I-frame is not requantized, whereas all P-frames are requantized. The weighting factor, ϕ, in Zhang’s method is set to 1. Aside from that, the η value in the proposed method is selected such that the encoded and transcoded file size is almost identical to the encoded and transcoded video file size by Zhang’s method.

6.1. PSNR performance with QPvt equal to QP2

Regarding the experiment method, original video sequences are first encoded with the original, Zhang’s and proposed encoders respectively. Then, the encoded (Enc) bitstreams are transcoded (Trans) using an open-loop requantization transcoder. The QP1 and QPvt value applied at the encoder and QP2 at the transcoder for each encoder is shown in Table 1. The QPvt of the virtual transcoder in Zhang’s encoder is assumed to be the same as the QP2 at the transcoder. The original YUV video sequence is taken as reference for the PSNR evaluation. View the MathML source of 4 is omitted for similar results are obtained.

Table 1.
Parameter at the encoder and transcoder side.
Encoder type/parameter Encoder side
Transcoder side
QP1 QPvt QP2
Original 24 None 28, 34, 40
Zhang 24 28, 34, 40 28, 34, 40
Proposed 24 None 28, 34, 40
Table options show the PSNR performance for a GOP with View the MathML source of 10, and 16 for each video sequence. Fig. 9 shows the PSNR enhancement between Zhang’s and the proposed method to the original encoder. Only View the MathML source equal to 10 is presented for similar results obtained for the other View the MathML source. Using the transcoded video PSNR by the original encoder as reference, the enhancement here is the PSNR difference between the original encoder to the proposed and Zhang’s encoder. Table 2 presents the average PSNR performance for each View the MathML source.

PSNR performance with ΔQP=10.
PSNR performance with View the MathML source.
PSNR performance with ΔQP=16.
PSNR performance with View the MathML source.
PSNR enhancement with ΔQP=10.
PSNR enhancement with View the MathML source.
Average PSNR (dB).
View the MathML source Video sequence Encoder type
Original
Zhang
Proposed
Enc Trans Enc Trans η Enc Trans
4 Foreman 39.13 31.39 39.09 31.75 8 39.07 31.77
Flower 38.40 26.55 38.38 26.77 0.5 38.37 26.80
Container 39.08 31.67 39.04 31.84 11 39.07 32.00
10 Foreman 39.13 29.40 39.07 30.10 8 39.07 30.06
Flower 38.40 24.33 38.35 24.58 6 38.37 24.54
Container 39.08 31.13 39.04 31.25 11 39.07 31.41
16 Foreman 39.13 26.83 39.05 28.67 11 39.05 28.35
Flower 38.40 22.32 38.34 22.63 7 38.37 22.76
Container 39.08 30.80 39.02 31.01 16 39.05 31.00
Table options
From the PSNR performance in, it can be said that our proposed and Zhang’s method are effective in reducing drift error. The proposed method has comparable results when compared to Zhang’s encoder. From Table 2, it can be observed that the enhancement of the average PSNR of both Zhang’s encoder and the proposed method with the original does not differ much because the difference is only around 0.1–0.5 dB. To show the PSNR enhancement of the proposed and Zhang’s encoder to the original shows that at the beginning of the frame sequence, PSNR enhancement is low or none in the case of the Flower sequence for the proposed method. However, as the frame index increases, the PSNR enhancement increases. A PSNR enhancement of 0.6 dB for the Flower sequence while 1.1 dB for the Foreman at the 34th frame are achieved. Although the PSNR enhancement at the beginning of frames is low, around 0.1–0.3 dB, it is acceptable because the drift error at the beginning of frames are low and usually the PSNR at the beginning of frames remain high which is around 35 dB and above even with the occurrence of drift error.

From Figs. 7(c) and 8(c) for the Container sequence, it can be observed that PSNR performance for three types encoder does not differ much. However, it can be observed that starting from 11th frame onwards, there is PSNR enhancement. At the 34th frame, a difference of 0.3 dB between Zhang’s and the original encoder, while 0.55 dB between the proposed and original encoder are obtained. The results obtained for Container are different from Flower and Foreman because of the characteristics of the video sequence itself. In Container, the motion is low. Consequently, the DCT values in most blocks are zero. Due to this, the amount of drift error will be low as well.

6.2. PSNR performance with QPvt not equal to QP2

In the previous sub-section, the QP2 value in the transcoder is equal to the QPvt in the Zhang’s encoder. However, in actual implementation, the value of requantization parameter that will be used at the transcoder is unknown at the encoder side. In this simulation, the QP1 remains as 24 and QPvt in Zhang’s method is 28. At the transcoder side, the video encoded by both Zhang’s encoder and the proposed encoder is transcoded with values of 30, 34, and 38, which corresponds to View the MathML source of 6, 10, and 14. Table 3 presents the parameters assigned to encoder and transcoder sides. Average encoded video PSNR values are not presented here because the encoded PSNR results are the same as in Table 2, presented in Section 6.1.

Table 3.
Parameters at the encoder and transcoder side.
Encoder type/parameter Encoder side
Transcoder side
QP1 QPvt QP2
Original 24 None 30, 34, 38
Zhang 24 28 30, 34, 38
Proposed 24 None 30, 34, 38
Table options
Presents the PSNR performance for three sequences with different View the MathML source values. Shows PSNR improvement of the proposed encoder against the Zhang’s encoder. For Foreman and Flower, the difference of PSNR between both methods does not differ much initially. However, as the frame index increases, the difference of PSNR between both methods increases, it can be observed that from the frame index of 14 and onwards in Foreman, the PSNR difference between Zhang’s and proposed method increases gradually. At the 34th frame a difference around 1 dB are visible. Consequently, it can be said that the PSNR performance of the proposed method is better than Zhang’s method. In Zhang’s method, the value of QP2 at the transcoder is assumed to be the same as QPvt in the virtual transcoder of the encoder. If the applied QP2 at the transcoder differs from the QPvt in the encoder side, then the estimated drift error distortion in the encoder is inaccurate. Consequently, the PSNR performance degrades. However, the proposed method is not influenced by the requantization parameter used at the transcoder side because TotalCoeff is used to search which block generates or propagates least error to future frames.

PSNR performance with difference ΔQP.
PSNR performance with difference View the MathML source.
PSNR improvement of proposed encoder against conventional encoder in each ΔQP.
PSNR improvement of proposed encoder against conventional encoder in each View the MathML source.
It can be seen that even with different View the MathML source, the PSNR does not degrade much. The difference of PSNR between View the MathML source and View the MathML source is only around 2 dB. In Foreman and Flower, the difference is around 4 dB. Container has lower PSNR degradation because it is a low motion video sequence. Consequently, its DCT values are mostly zeros and have a lower occurrence of drift error. Due to this, the proposed method was unable to achieve PSNR improvement against Zhang’s method. This is because the selected motion vector in the proposed and Zhang’s method is similar.

6.3. Computation time performance

Here, the performance of the three different encoders in terms of computation time is evaluated by comparing the total time needed to encode 250 frames. The GOP for each sequence remains at 35 frames. Table 4 shows the computation time performance for the three sequences. Only results for View the MathML source 10 are shown here because similar results were obtained for other View the MathML source.

Table 4.
Computation time.
Video sequence Parameter Encoder type
Original Zhang Proposed
Foreman Frames per second (fps) 5.49 1.82 2.97
Encoded file size (kb) 1318 1430 1438
Total time (s) 45.55 137.12 84.30
Flower Frames per second (fps) 4.00 1.57 2.35
Encoded file size (kb) 3410 3549 3699
Total time (s) 62.50 158.92 106.38
Container Frames per second (fps) 7.59 3.12 4.28
Encoded file size (kb) 768 803 805
Total time (s) 32.98 80.08 58.49
Table options
The results showed that the computation time needed in the proposed encoder is lower than Zhang’s encoder. Between the proposed and Zhang’s encoder, an enhancement of around 0.80–1.10 (fps) is achieved. Even though the encoded file size of the proposed method is slightly higher than Zhang’s method, the number of frames processed within a second is still higher than Zhang’s encoder. With comparison to the original encoder, the additional time needed for the newly added process in the proposed encoder is around 40.00 s. The drift distortion is considered in the ME process. Because of this, the proposed encoder will need more time to search for the best block for reference so that the drift propagation will be low.

Regarding memory usage, the proposed encoder needs only 1/16 of the memory needed by the Zhang’s encoder because estimation in the proposed method is using DCT information while Zhang’s encoder is using pixel information. The memory needed to store in Zhang’s method for a frame is the resolution size, 352 (pel)×288 (line), which is 101,376. There is 396 MB within a frame and within a MB there is 16 blocks. Total blocks within a frame are 6336. Therefore, the total of the TotalCoeff value that needs to be stored for a frame is 6336 as well. Consequently, memory needed for proposed method is only 1/16 (6336/101 376) of that necessary for Zhang’s method.

7. Conclusion
As described in this paper, a drift-error-resilient encoder is proposed using TotalCoeff within blocks to estimate the amount of drift error propagation. This estimated drift error distortion is considered in the motion estimation process to select an MB that propagates the least drift error.

From the simulation results, it can be said that the proposed drift-error-resilient encoder improves the open-loop transcoder performance. Furthermore, comparable PSNR performance with Zhang’s drift-error-resilient encoder can be achieved simultaneously with lower computation time. In addition, the memory usage of the proposed method is 1/16 of the conventional method.

The main advantage of the proposed method compared to Zhang’s is that without implementing a virtual transcoder at the feedback loop of the encoder, comparable results and lower memory usage can be achieved. Defining a certain value of requantization parameter at the encoder side, which is impractical, is eliminated as well.

Comments