Detailed explanation and implementation of MPEG-4 encoding principle of PC

The contradiction between increasing multimedia traffic and limited communication bandwidth has become increasingly prominent. In order to reduce the amount of data transmitted, the International Telecommunication Union and the MPEG standards organizations have developed their own video compression standards. Among them, the latest MPEG-4 standard, with its high compression ratio, supports low bit rate transmission and object-based coding, and is widely used in fax, video on demand, video conferencing, medical image transmission and other fields.

MPEG-4 inherits the MPEG-2 mid-level and level concept. There are four types of video grades, audio grades, graphics grades and scene description grades. The grade is the encoding tool used for a particular application. It is a subset of the toolsets provided by MPEG-4, with different grades being different subsets. Each grade is divided into one or more grades, which define the specifications of the bitstream parameters and actually limit the performance of decoding a coded sequence.

MPEG-4 is unique in that it supports content-based codec, and introduces the concept of AVO (Audio/Video Object). AVO can be a violin or a piano in an image. Each AVO can be independently encoded, but there is a spatio-temporal relationship between them. Therefore, when encoding, the composition information structure "scene description" of the encoding object must be transmitted. To represent the spatio-temporal relationship between the AVOs in the scene. The content of the image and sound is edited and operated according to this "scene description" during decoding. The audiovisual object can also be a rectangular frame, making MPEG-4 compatible with the original MPEG standard. The video object VO in an audiovisual object is usually described by three types of information, namely motion, shape and texture information.

1 coding principle

The MPEG-4 encoder is mainly composed of three parts: shape coding, texture coding and motion coding. The frame format is divided into I-VoP, P-VOP and B-VOP. Only I-VOP and P-VOP are discussed here. I-VOP uses texture coding to eliminate spatial redundancy between image data of one frame; P-VOP refers to the image of the previous frame and is encoded with two parameters. One is the difference between the image to be encoded and the reference image; the other is the motion vector. The specific structural module of the encoder is shown in Figure 1.

1.1 Shape coding

Shape coding is mainly used to record shape information of extracting VOPs from image sequences, and the information is divided into binary shape information and grayscale shape information. The binary shape information indicates the shape of the VOP with two values â€‹â€‹of 0 and 1, and the gray shape information indicates different transparency of the VOP region with 0 to 255. When encoding, when the extracted VOP has a non-rectangular shape, it needs to be extended by its boundary so that its rectangular boundary is a multiple of 16, while ensuring the expanded area is minimum, and then shape encoding; when the extracted VOP is a rectangle When the length (width and width of the rectangle are multiples of 16), the shape code will be masked.

1.2 texture coding

Texture coding mainly encodes image pixels in I-VOP or difference pixels in P-VOP, including DCT, quantization, DC and AC prediction, entropy coding, etc., to maximize the removal of pixels between the current VOPs. Spatial redundancy.

The first frame of the video is encoded in the I-VOP format, and the rest of the frames are I-VOP, or the P-VOP format is constrained by two factors. The user will set the format of the current frame according to IPPPIPPPI; secondly, the current frame has been artificially set to P-VOP, and the value of mad_P is calculated by motion estimation. If mad_P satisfies one of the following two conditions, the current frame The P-VOP encoding format is adopted, otherwise the I-VOP encoding format is adopted.

Condition 1: mad P<50/3;

Condition 2: mad P < 50, and IntraMBRatio < 0.4. The IntraMBRatio is the proportion of the macroblocks in the current frame using the MBM_INTRA prediction mode to the total macroblock.

The DCT and quantization modules in texture coding are relatively simple, skipped here, focusing on the remaining texture coding modules.

1.2.1 DC and AC prediction

After 8Ã—8 blocks undergo DCT and quantization, the coefficient arrangement exhibits the following law, that is, the non-zero coefficients are concentrated in the upper left corner, and most of the 0 coefficients are concentrated on the right and downward positions of the DC coefficients. In particular, the DC coefficient, the first row and the first column of AC coefficients are non-zero and large. If they can be replaced with smaller values, the number of encoded bitstreams is reduced, thus generating DC and AC predictions.

In MPEG-4, a macroblock is usually divided into six 8x8 blocks for DC and AC prediction.

First, DC prediction is performed on 8Ã—8 blocks. As shown in FIG. 2, X represents the current 8Ã—8 block; A, B, and C represent adjacent 8Ã—8 blocks of X, and their positions are respectively located at the left, upper left, and upper sides of X. DC prediction for X is to predict the DC coefficient value of X by using the DC coefficient value of the adjacent block. The key is to select which DC coefficient of the adjacent block.

This article refers to the address: http://

The program uses the following strategy to select neighboring blocks. The DC coefficient values â€‹â€‹of the definition blocks A, B, C, and X are DC_A, DC_B, DC_C, and DC_X, respectively.

If the difference between DC_A and DC_B is smaller than the difference between DC_B and DC_C, then DC_A and DC_B are numerically close, that is, the value in the vertical direction is closer than in the horizontal direction, so DC_C is used to predict DC_X; otherwise, in the horizontal direction. The values â€‹â€‹are relatively close, that is, DC_A is used to predict DC_X.

The DC coefficient of the current block and the DC coefficient of the neighboring block used for prediction are subjected to a specific processing, and the difference is stored in the DC position of the current block, and the prediction direction of the DC coefficient is recorded.

The AC prediction is mainly for the first row or the first column AC coefficient of the 8x8 block, and the prediction direction depends on the prediction direction of the current block DC coefficient. As shown in FIG. 2, if the previous DC prediction is a horizontal prediction, the first column AC coefficient of the current block X is predicted by the first column AC coefficient of the A block, and the first 7 AC coefficients of the first column of X are respectively taken as absolute values. The post phase is added to the variable S1 (the initial value of S1 is 0). The first column AC coefficient of the current block is made to be different from the first column AC coefficient of the neighboring block A used for prediction, and the 7 difference values â€‹â€‹are stored in the position of the first column AC coefficient of the current block, and 7 differences are simultaneously The values â€‹â€‹are each taken to the variable S2 (the initial value of S2 is 0). If the previous DC prediction is a vertical prediction, only the first row AC coefficient prediction of the current block X is performed, and the prediction step is the same as the prediction of the first column AC coefficient.

Sometimes the AC prediction will produce a large prediction error, and it does not achieve the purpose of saving the bit stream. Therefore, it is necessary to judge the validity of the AC prediction. In the AC prediction of a single 8Ã—8 small block, the sum of the absolute values â€‹â€‹of the AC coefficients of the first row or the first column of the small block is recorded by S1, and the first row or the first column is predicted by S2. The absolute sum of the differences. The difference between S1 and S2 of each small block is added in units of six 8Ã—8 small blocks of one macro block to obtain a value S. If S is non-zero, then this macroblock performs AC prediction with its flag ACpred_flag set to 1, otherwise the macroblock does not perform AC prediction and ACpred_flag is set to zero.

1.2.2 zigzag scanning

After the DC and AC prediction, the zigzag scan of the coefficients of the 8Ã—8 block has three scanning modes: Zigzag, Zigzag_v (alternating vertical scanning) and Zigzag_h (alternating horizontal scanning). Which scanning method is used is determined by three factors, namely, intra or inter prediction, the value of the AC prediction flag ACpred_flag, and the prediction direction of the DC coefficient.

For intra prediction macroblocks, if the AC prediction flag ACpredflag is 0, then 6 8Ã—8 blocks in this macroblock are scanned using Zigzag; if the AC prediction flag is 1, then 6 8Ã— in this macroblock The 8 blocks will determine the scan direction of the AC coefficients based on the respective DC prediction directions. If the DC prediction is horizontal prediction, then this 8x8 block scans the coefficients using the Zigzag_v scan mode, otherwise the Zigzag_h scan mode is used.

For inter-predicted macroblocks, each 8Ã—8 block uniformly uses Zigzag scanning mode to scan coefficients.

After the 8Ã—8 coefficient matrix is â€‹â€‹scanned by zigzag, most of the non-zero coefficients are concentrated in the front part of a one-dimensional array, and most of the zero coefficients are concentrated behind this one-dimensional array. According to this feature, run-length coding is generated.

1.2.3 Run-length coding and entropy coding

The so-called run-length encoding is to specifically process the AC coefficients of the 8Ã—8 coefficient matrix, making it a smaller number of three-dimensional vectors (Last, Run, Level). Where Level represents the size of the non-zero coefficient. Run represents the number of consecutive 0s in front of Level. Last represents the termination flag: when the value is 0, it means that there is a coefficient that is not 0 after the Level; when the value is 1, it means that the coefficient is the number that is not 0 at the end; the remaining coefficients are all 0. The run length code generates a three-dimensional vector, compresses the amount of data, and then finds the codeword in the corresponding Huffman code table according to different combinations of Last, Run and Level, and generates a code stream.

1.3 Motion coding

The motion coding performs motion estimation and compensation on the current P-VOP and the reference VOP, reduces the temporal correlation between frames, and implements compression.

Motion estimation is usually performed using block matching. The block matching method is to find the minimum matching block of the Sum of Absolute Difference (SAD) from a certain area of â€‹â€‹the reference frame for a certain size of the image frame in the current frame, and use the matching block to predict the current Piece. The absolute error of an image block and the sum of the absolute values â€‹â€‹of the pixel differences in the two image blocks of the same size. The SAD ₁₆ _{Ã— 16} function implements the absolute error sum between the current macroblock and the reference macroblock; the SAD ₈ _{Ã— 8} function implements the absolute error sum between the current 8Ã—8 block and the reference 8Ã—8 block.

After the block matching criterion is determined, the search for the optimal matching point is performed. The MPEG-4 verification model finally uses the Diamond Search (DS). The diamond search method is a fast search method that uses the shape and size of the search template to have an important influence on the speed and accuracy of the motion estimation algorithm. Two search templates with different shapes and sizes were chosen: one is the Large Diamond Search Pattern (LDSP), which has 9 candidate positions: (0,0), (0,2), (1 , 1), (2, 0), (1, -1), (0, -2), (-1, -1), (-2, 0) and (-1, 1). The specific template is shown in Figure 3. One is the Small Diamond Search Pattern (SDSP), which contains five candidate positions: (0,0), (0,1), (1,0), (0,-1) and (- 1,0). The specific template is shown in Figure 4.

The diamond search process is as follows: the origin of the upper left corner of the current macroblock of the current frame is the origin (0, 0) of the large template, and within the search range of the reference frame, respectively (0, 0), (0, 2), (1,1), (2,0), (1,-1), (0,-2), (-1,-1), (-2,0) and (-1,1) The pixel is used as the starting point of the upper left corner of the macroblock, and the macroblock and the macroblock of the current frame are subjected to SAD16Ã—16 operation, and the starting point of the upper left corner with the smallest SAD16Ã—16 value is selected as the temporary best matching point, which is the current macroblock. The displacement between the starting points in the upper left corner is the motion vector. Determine if this motion vector is suitable for a particular rule, and if not, perform a new round of large diamond template search until a suitable motion vector is found. Then use the small diamond search template as the center point (0,0) and the small diamond search template (0,1), (1,0), (0,-1) and (-1, 0) Accurate search of four reference points, the minimum point of the searched SAD16Ã—16 value is the final best matching point, and the displacement between the best matching point and the starting point of the upper left corner of the current macroblock is the final motion vector.

The above is a full-pixel search based on macroblocks, and it is also possible to select whether to perform a full pixel search based on 8Ã—8 blocks. The best matching point based on the macroblock is found using the obtained macroblock-based motion vector. This matching point is centered (0,0), and other matching points are (-1,-1), (0,-1), (1,-1), (-1,0), (1,0), (-1,1), (0,1), (1,1), with the 9 matching points as the upper left corner starting point of the 8Ã—8 pixel block, respectively, the 8Ã—8 block and the current frame 8Ã—8 The block performs the operation of absolute difference sum, selects the starting point of the upper left corner with the smallest SAD ₈ _{Ã— 8} value as the best matching point to find the best motion vector, and obtains the motion vector of each 8Ã—8 luminance block in the macroblock and SAD ₈ _Ã—8 value. The sum of 4 SAD ₈ _{Ã— 8} is compared with SAD ₁₆ _{Ã— 16} , and the smaller value is processed according to a certain rule, thereby judging that the prediction mode of the current macroblock is intra or inter prediction.

If inter prediction is employed, a half-pixel search is further employed. First, the bilinear difference is made to the entire reference frame, then the area of â€‹â€‹the reference frame is changed to 4 times, and then a more accurate motion vector is searched for within a specific search range, and finally, according to the luminance information and chrominance information of the reference image. And motion vector to do motion compensation.

If intra prediction is used, no motion estimation is performed on the current macroblock, and the corresponding motion compensated reference macroblock value is zero.

Finally, the motion compensated reference image is subtracted from the current frame to obtain the difference, and the difference data is texture-coded; the motion vector of each macroblock is predicted and the difference value is obtained, and the difference value is bit-stream converted and output.

2 Encoder implementation and testing

According to Figure 1 and combined with the implementation principle of each module of MPEG-4 encoder, the main function of the encoder is written and debugged. It is preliminarily determined that the main function of the encoder should include the following three parts:

The initialization part is due to the RGB to YUV image format conversion function in the reference code (not described in this article), the corresponding space for storing BMP and YUV images should be opened; the space should be opened to store the compressed file generated by the encoder; the encoder must be set. The encoding parameters.

The encoding processing part is realized by cyclically encoding each frame image. After reading one frame of image, it is judged whether I frame or P frame is used, and then the VOP header information is output, the current VOP is encoded, and the bit stream information is output to the buffer. Finally, the fwrite function is used to form a disk file.

The memory space opened in the first two phases of releasing resources must be released, and the encoding process of the entire video ends.

Combined with the reference encoding main function, debug and run the encoder, generate a divx file, and debug the decoder accordingly. The quantization parameter QP, frame rate, output code rate and I frame interval parameters of the encoder are changed one by one to test their influence on the coding effect, such as the quantization parameter QP, the frame rate and the I frame interval parameter are proportional to the compression ratio, and the output code The rate is inversely proportional to the compression ratio. After comprehensive testing, under the premise of ensuring the quality of the decoded image, when the QP is 8, the frame rate is 30 f/s, the output code rate is 400 000 b/s, and the I frame interval is 3, the compression ratio is 58.8. Achieve the best compression. Of course, if the visual effect after decompression is not high, the compression factor can continue to increase.

3 Conclusion

The MPEG-4 is part of the MPEG-4 standard, and its codecs have been supported by many vendors. At present, H.264 as MPEG-4 Part10 has also been launched and developed. Compared with MPEG-4 Part2, H.264 can reduce the code rate by about 50% under the same quality, indicating that MPEG-4 has been developing. With the practical development of content-based coding technology, MPEG-4 will have a wider application prospect.

The small burner is best suited for light and fast hiking. All-in-one stoves have better resistance to wind and more fuel efficiency and are therefore more suitable for use in strong winds, but they can also affect the diversity of food cooking and have a greater weight. The fuel tank and the burner are separate, so the stability and versatility of the better, but their size and weight will affect their performance in some occasions. Firewood stoves, alcohol stoves and other stoves, although very popular with those who are extremely lightweight enthusiasts welcome, but it is at the expense of cooking at the expense of the price.

Camping Stove

Camping Stove,Camping Gas Stove,Mini Stove,Wood Stove

Ningbo APG Machine(appliance)Co.,Ltd , http://www.apgelectrical.com