HDfpga: DM6467 / DM6467T H.264 Codec FAQ

http://processors.wiki.ti.com/index.php/DaVinciHD_Codecs_FAQ

What are Blocking and Non-Blocking cycles?

In DM6467, the video codecs are partitioned between the DSP and the HDVICP. The process() API is called on the DSP. But after some initializations (loading the HDVICP, header parsing, etc.) , the process call returns back to application (SEM_Pend). Now the Hdvicp calls the DSP in interrupt mode. The total cycles required to do encode/decode is the Blocking Cycles. The cycles for which the DSP is utilized for the codec tasks are the Non-Blocking Cycles. After the last Macro-Block in the picture is processed by the Hdvicp, the codec ISR posts a SEM_Post. The codec task now wakes up to do the end of frame processing.

DM6467 has 2 Hdvicps. Can both the Hdvicps execute in parallel?

Yes, both the Hdvicps can execute simultaneously. Both the Hdvicps are totally independent of each other - but they share common resources (DSP, EDMA).

What is the difference between the 2 Hdvicps?

Hdvicp0 has the capability to do both Encode and Decode. Hdvicp1 can do only Decode. This is because Hdvicp1 does not have the IPE and ME engines that are required in encoding.

Can the Hdvicps access the DDR?

No, the Hdvicps cannot access the DDR directly. It can trigger a DMA to fetch/write data to DDR.

Can the ARM968 inside the Hdvicp write into EDMA PaRAM?

Yes, the ARM968 can program, trigger and wait on EDMA channels. The ARM 968 is also a master on the System Config Bus which is used to write into the EDMA PaRAM and registers.

Can the DSP access the Hdvicp registers/memory?

Yes, the DSP can access the Hdvicp IP buffers and registers.

What is the ratio of frequencies at which the DSP and Hdvicp is clocked?

The frequency ratio between the DSP and the Hdvicp is 2:1. So if the DSP is clocked at 594 MHz, the Hdvicp is clocked at 297 MHz.

In the LPF engine of the Hdvicp, is it possible to program the filter co-efficients?

No, it is not possible to program the filter co-efficients in the LPF. The LPF supports filtering as per the standards (H264/VC-1).

In the MC engine of the Hdvicp, is it possible to program the weights of the filter for interpolation?

No, it is not possible to program the filter weights. Only the standards specific interpolation can be implemented.

Can the Hdvicps do encode/decode for 422 chroma format?

No, the Hdvicps do not support 422 chroma format.

How to do the 422 --> 420 chroma conversion (for encoder) and 422 --> 420 chroma conversion (for display) on DM6467?

The VDCE engine on the DM6467 can be used for the chroma conversions. Alternately, you could even use the DSP to do the chroma conversion.

The VDCE can also do the edge padding required for supporting UMV in encoder/decoder. Do you use this in TI codecs?

No, we do not use the VDCE to do the edge padding in TI codecs. We use the EDMA and DSP for this. The reason is that if we use the VDCE, this operation will be sequential with the actual encoding/decoding of the frame (the reference obtained by padding is required to encode/decode the immediately following frame). Since it is not in parallel with the DSP or Hdvicp, using the VDCE does not give us any advantage. We also wanted the VDCE to be free to be used for any scaling/chroma conversions in the application.

Can the VDCE do up-scaling of YUV?

No, the VDCE can be used only for down-scaling the YUV. It does not support up-scaling.

Is there HW support for RGB --> YUV conversion?

No, the HW does not support RGB to YUV conversion.

What is the YUV format that the DM6467 codecs use?

The DM6467 codecs use the semi-planar format. The Luma component is 1 plane. The other plane is CbCr interleaved. This is the format that the Hdvicp uses internally for processing.

What is the suggested work-around for DSP MDMA/SDMA deadlock issue w.r.t. the codecs?

The work-around to avoid the deadlock is : The same Master cannot write to the L2 and DDR. The same Master cannot write to L2 and Hdvicp buffers. So allocate one TC for all writes to L2 in the codecs. No writes to DDR or Hdvicp should be done on this TC. In TI codecs we use TC0 for all writes to L2. The remaining 3 TCs do not write to L2.

The DSP and Hdvicp cycles are within the budget to meet real-time constraints. But still the Blocking cycles are more than the real time constraint. What could be the reason for this?

For the codec to be real-time 3 threads need to be within the budgeted cycles. First, the DSP cycles and Hdvicp cycles should fit within the budget. Note that the DSP and Hdvicp should be running in parallel to utilize the capabilities of the HW effectively. The EDMA is the third thread that runs in parallel with the DSP and the Hdvicp. There are 4 TCs in DM6467 EDMA. Try to distribute the transfers on the TCs such the load is balanced between these TCs. Usually, it happens that one TC is much more loaded than the other TCs, and hence this particular TC could become a bottleneck.

I have distributed the load on the EDMA TCs, but still I find that the DMA cycles have not reduced. Why?

Note that there are 4 ports on each Hdvicp for the transfer of data in/out of the Hdvicp buffers. The 4 ports are UMAP1, R/W port, R port and the W port. Check if the transfers are also distributed between these ports. It could happen that all your transfers are happening on a particular port, and now this port is becoming a bottleneck!

Ok, I understand that there are multiple ports on the Hdvicp. How do I choose through which port the data gets transferred?

You can choose the port through which data gets transferred by choosing the respective address for the source/destination. For the same physical Hdvicp buffer, each port has its own address map. Just by choosing the right addresses for the source and destination you can choose the port. Please refer to the DM6467 Address Map spreadsheet for the exact addresses.

In the DM6467 codecs, who boots up and starts the Hdvicp ARM 968?

The DSP loads the ARM968 code into the ARM ITCM and starts off the Hdvicp. This is done as a part of the frame level initializations of the process call.

What is the performance of each of the codecs on the DM6467?

Please request for the datasheets of the individual codecs for the accurate performance details. Note that the performance depends on a number of factors (features that are being turned on/off, Cache Sizes, type of content). The datasheet provides these details.

If I use a higher MHz part, does the performance of the codecs scale linearly?

The performance of the codecs depends on the DSP/Hdvicp frequency as well as the DDR2 speed. Note that since the DDR is not being scaled up linearly for the various DM6467 parts, the performance will not scale up linearly.

What is the profile supported? What is the max resolution supported? What is the maximum level supported ?

The DM6467 H264 Decoder supports all 3 profiles -- BP/MP/HP. Note that it does not support ASO/FMO in the Baseline Profile. The maximum resolution supported is 1920 x 1088 (1080HD). The max level supported is 4.0

Does the Decoder support CABAC /B-frames/Interlaced (Field picture and MBAFF) decoding?

Yes, the decoder supports all the above tools. Note that these tools are a part of MP/HP profiles and the decoder supports all these tools.

Does the decoder support 422 Chroma format?

No, the decoder only decodes 420 encoded streams. The HDVICP does not have support for 422 format.

Can the decoder execute on any of the 2 Hdvicps?

Yes, the decoder can use any of the 2 Hdvicps; both the Hdvicps support decoding.

What is the XDM version used by the TI H264 Decoder?

The decoder uses XDM 1.0 (ividdec2.h) interface.

What are the resolutions supported by the Decoder?

The decoder supports resolutions above 64x64 upto 1920x1088. Note that the width and height need to be a multiple of 16.

Can the TI decoder decode streams with multiple slices? Does the Hdvicp support multiple slices?

Yes, the decoder can decode streams with multiple slices. We have already tested the decoder with a number of streams with multiple slices (JVT Conformance streams, Professional test suites available in the market, and our internally generated streams). Support of multiple slices is not dependent on the Hdvicp; this is totally a SW feature.

Can the TI decoder be used for multi-channel applications?

Yes, the TI decoder can be used for multi-channel cases. There is nothing in the decoder that restricts this. The application can create multiple instances of the codec without requiring any change in the codec library.

Does the decoder expect an entire frame of compressed data as input to the process call or can I input individual slices to the decoder?

The decoder expects an entire frame/picture before it starts decoding. We are currently not supporting Slice Level APIs, all APIs are at frame/picture level.

Does the TI decoder support Error Resiliency/Concealment?

Yes, the decoder supports Error Resilience and Concealment. The Error Resiliency feature is very robust; we have tested the decoder with ~9000 Error streams. For Concealment, if the current picture is in error we copy the pixels from the previously decoded picture.

Does the decoder need an IDR to start decoding or it can start decoding from any random point?

The decoder does not need an IDR to start decoding the sequence. It can start decoding from any frame assuming the reference to be 128. Of course, there will be error propagation because of this. The decoder will then re-sync once an IDR or a picture with Recovery Point SEI is received.

Suppose a frame to be decoded has some slices missing in the frame. Will the decoder decode the remaining slices correctly, or will it conceal the entire frame?

The decoder will decode the remaining error-free slices correctly. If a slice is in error, the decoder will latch on the next slice in the frame. It will apply concealment only for the missing slices, and not for the entire frame.

What are the SEI messages supported by the decoder?

The SEI messages supported by the decoder are: Buffering period SEI message Recovery point SEI message Picture timing SEI message Pan-scan rectangle SEI User data registered by ITU-T Recommendation T.35 SEI All the SEI messages are passed back to the application/or used by the decoder in the decode order.

Does the decoder need the encoded bitstream for a complete picture to start decoding or can the application choose to input individual slices to the decoder?

The decoder needs the input buffer to consist of at least one complete picture before the decode_process () is called. The codec SW does not support slice level process calls. Note that this is how the codec SW is architected; the Hdvicp does not have any limitation to support slice level APIs.

Can the decoder be asked to skip decoding of non-reference pictures?

Yes, the decoder has the flexibility to skip the decoding of non-reference pictures. This has to be communicated via the inArgs to the decoder. We can use the skipNonRefPictures field in the inArgs for this. The supported values are: 0: Do not skip the pictures 1: Skip a non-reference Frame or Field 2: Skip a non-reference Top Field 3: Skip a non-reference Bottom Field. When the decoder is called with this field set to 1/2/3 it does not decode the picture if it is a non-reference picture; but it returns the number of bytes that were consumed to skip the current picture. This is helpful if the application needs to move the input pointer to the next NAL Unit.

How do you do the padding for the picture edges to support UMV?

We use the DSP and EDMA to do the padding. We pad only the left and right edges at the end of the frame. For the top and bottom edges, padding is done on the fly when the padded data is required for interpolation by the Motion Compensation HW.

What is the maximum bit-rates that can be supported by the decoder?

For CABAC encoded streams the H264 decoder will be real-time (on 594 MHz part) for bitrates up to 14 Mbps. For CAVLC encoded streams the decoder will be real-time for bit-rates as allowed by the standard for level 4.0 (H264 standard allows a max bit rate of 24 Mbps for level 4.0 streams).

What is the maximum resolution supported by the TI encoder?

Currently, TI has 2 separate encoders on DM6467. The 720p encoder can support resolutions from QCIF up to 720p. It can also support 1920 x 544 resolution as a special case. The 1080p encode can support resolutions from 640 x 448 up to 1920 x 1088. The restriction on 1080p30 encoder is that the width must be a multiple of 32, and height must be a multiple of 32.Also if the input resolution is 1920x1080, the last 8 lines of the frame need to padded and provided by the application to the encoder

What is the difference between the 2 encoders?

The basic difference between the 2 encoders is that they use different Motion Estimation Algorithms. Because of this, the performance and quality is different for the 2 encoders. Also there is a difference in the number of resources being used by the encoders. 720p encoder uses 21 EDMA channels and 32 Kbytes of L2 as SRAM. 1080p30 encoder uses 49 EDMA channels and 64 Kbytes of L2 as SRAM.

What is the XDM version used by the TI H264 Encoder?

The Encoder uses XDM 1.0 (ividenc1.h) interface

Can the encoder run on any of the 2 Hdvicps?

No, only the Hdvicp 0 has the support to do encoding. Hdvicp 1 does not have the Motion Estimation and Intra Prediction Estimation engines, hence it cannot support encoding.

Does the TI encoder support multiple slices?

Yes, the TI encoder supports multiple slices. In the 720p encoder, a slice can be a multiple of rows. The application can choose the number of rows that will be encoded in a slice. The 720p encoder currently does not have support for H241 (slices based on number of encoded bytes/slice). The 1080p encoder supports the H241 feature. The user can input the maximum number of bytes in a slice, and the encoder will encode the slices such that the number of bytes in the slices does not exceed the specified value. The 1080p encoder also supports slices based on the number of rows.

What features does the encoder support? Does the encoder support B-frames?

The TI encoders are Baseline Profile encoders with support of some MP/HP tools (CABAC, 8x8IPE Modes and 8x8 transform). We do not support B-frames in the encoders. Note that the Hdvicp has support to encode B-frames; the codec SW is not supporting this today.

What are the Motion Estimation Algorithms being supported by the TI encoder?

For the 720p encoder, we support 3 types of Motion Estimation. ME Type 0 (or Original ME) is recommended for resolutions greater than or equal to D1. ME Type 1 (Low Power ME) is a modification of the Original ME to reduce the DDR BW. Similar to the Original ME, the LPME is recommended for resolutions of D1 and above. Both the LPME and original ME give 1 MV/MB. For resolutions below D1, it is recommended to use ME Type 2 (Hybrid ME). This algorithm gives upto 4 MV/MB. The encoder offers the flexibility to the application to select any of the 3 ME schemes during the codec instance creation. For the 1080p encoder, currently we support the Decimated ME Scheme ( a coarse search followed by a refinement search). This ME scheme is different from the ME schemes in the 720p encoder. Decimated ME is suitable for high resolution and high motion sequences. It is recommended that the Decimated ME be used for 1080p resolution. Note that Decimated ME generates 1 MV/MB. There is a plan to add the LPME scheme to the 1080p encoder, and the application can select either of the 2 ME schemes during the codec instance creation.

Can TI encoder be used for multi-channel scenarios?

Yes, both the TI encoders can be used for multi-channel use-cases. The application can create multiple instances of the encoder without any change required in the codec library. The application can also run multiple instances of the TI encoder/decoder.

Does the ME HW support SATD?

No, the ME HW on Hdvicp supports search based on SAD; we do not have support for SATD on HW.

What are the neighboring pixels that are given to the IPE module for Estimation of Intra Prediction Modes?

The top neighboring pixels are the unfiltered reconstructed pixels. Due to the pipeline constraints on the DM6467, the left pixels cannot be the reconstructed pixels. Hence we use the original pixels for the left neighbors.

Can I modify the interpolation co-efficients in ME to support my own filter to interpolate?

No, the ME HW does not allow you to modify the interpolation co-efficients.

TI encoder generates SPS/PPS headers only at the beginning of the sequence. Is there a way to insert the SPS/PPS headers in the middle of the sequence?

Yes, you could insert SPS/PPS as required by your application. In the dynamicParams, set the generateHeader field as 1 and call the control API with XDM_SETPARAMS command. Now the next process call will generate the SPS/PPS. Note that this process call will not encode a picture; it will only generate the headers. Once the headers are generated, set generateHeader field as 0, and again call the control API with XDM_SETPARAMS. Now the process calls will actually encode the frames. This sequence can be repeated whenever the application requires the encoder to generate the SPS/PPS headers.

I want to generate an IDR frame in the middle of the sequence. What do I do?

In the dynamicParams, set the forceFrame field as IVIDEO_IDR_FRAME (defined as 3 in ivideo.h). Call the control API with XDM_SETPARAMS command. Now the next process call will generate the IDR frame. Once you get the IDR frame set the forceFrame field as -1, and call the control API. Now the next process call will generate the P frame. This sequence can be repeated whenever the application requires the encoder to generate IDR frames.

Can the encoder do 422 to 420 color conversion?

No, the H264 encoders (both 72030 and 1080p30) cannot do 422 to 420 chroma conversion. The application can use the VDCE to do this..

Can the encoder support PicAFF encoding?

No, the encoder does not support PICAFF or interlace encoding. The 1080i/p Encoder would support interlaced coding by Oct. 2009.

Can the 1080p encoder do 720p@60fps?

Yes, the 1080p30 encoder can encode 720p@60fps. But the quality of 720p30 encoder encoding @60 fps will be better than the 1080p30 encoder for 720p@60 fps.

Can Hdvicp support de-blocking filtering operations across slice boundaries?

Yes, the Hdvicp can support de-blocking across slices. The encoder provides control whether to enable or disable this feature at the time of codec instance creation.

Is the encode_process() a frame level API or a slice level API? Do I need to input an entire frame of YUV for the encode_process(), or can the encoder can take a part of the frame and can encode a slice for each process call?

The encoder expects a complete frame as input to the encode_process call. Each encode_process call generates compressed stream for an entire frame. Presently, we do not support slice level APIs.

HDfpga

Friday, December 31, 2010

DM6467 / DM6467T H.264 Codec FAQ

No comments:

Followers

Blog Archive

About Me