Sunday, January 23, 2011

AWB/LEAP for FPGA Design

AWB is tool for facilitating the modular plug-n-play style construction of software and hybrid software/hardware projects. It was originally written to create software performance models within the Asim performance modeling infrastructure. AWB's origin as part of Asim is the reason for the many references to Asim in the documentation and tools themselves. AWB has, however, been extended to be a more general tool for constructing a variety of projects where selection of alternative implementations of modules in a plug-n-play manner is advantageous. In specific, AWB now supports hybrid hardware/software designs. The first of these was the HAsim modeling infrastructure, but a variety of other more general designs are currently using AWB.

Asim was developed by Compaq researchers in late 1998 to allow model writers to faithfully represent the detailed timing of set of issues identified during two standards efforts: the IEEE Std. 1061-1998 for a Software Quality Metrics Methodology and the American National Standard Recommended Practice for Software Reliability (ANSI/AIAA R-013-1992). The second approach ties these knowledge requirements to phases in the software development life cycle. Together, these approaches define a bodyof knowledge that shows software engineers why and when to measure quality. For detail, see

http://www-inst.eecs.berkeley.edu/~cs266/sp10/readings/emer02.pdf

LEAP is a Virtual Platform that provides a consistent set of interfaces and functionalities to an FPGA application across a range of physical FPGA platforms. For the detail, see

http://www.ece.cmu.edu/~calcm/carl2010/lib/exe/fetch.php?media=carl2010-parashar.pdf

To setup FPGA with AWB and LEEP, see

http://memocode2010.csail.mit.edu/redmine/wiki/1/FPGA_Hardware_setup

TI CSL Interrupt

How to use the CSL to complement the OS dispatcher in handling cascaded interrupts:

http://focus.tij.co.jp/jp/lit/an/spraa66/spraa66.pdf

Thursday, January 20, 2011

TI EDMA Low-Level Driver

A good training slides:

http://processors.wiki.ti.com/images/5/5e/EDMA3_LLD.pdf

and wiki:

http://processors.wiki.ti.com/index.php/Programming_the_EDMA3_using_the_Low-Level_Driver_(LLD)

TI also provides examples:

https://gforge.ti.com/gf/project/lld_examples/scmsvn/?action=browse&path=/trunk/

Monday, January 17, 2011

TI DSP Framework Components API Reference

index.html

Sunday, January 16, 2011

TI C64x+ H.264 Encoder Performance

http://processors.wiki.ti.com/index.php/C64x%2B_and_Davinci_codec_performance_tables

Latency in Video Encoding and Decoding

A Latency Analysis on H.264 Video Transmission Systems

http://publik.tuwien.ac.at/files/pub-inf_4978.pdf

H.264 "zero" latency video encoding and decoding for time-critical applications

http://www.design-reuse.com/articles/exit/?id=16677&url=http://www.dspdesignline.com/showArticle.jhtml%3FprintableArticle%3Dtrue%26articleId%3D201804895

Early digital video compression solutions primarily focused on applications that do not require real-time interaction, such as in TV broadcast, video-on-demand and DVDplayback. In these applications the latency between the source and the decoded video is not important and can easily extend to several seconds. However, in applications where there is a closed feedback loop, such as video conference and videophone, latency is the most crucial aspect of the system, as it determines whether the system will be stable or not. Keeping the latency of a video codec in such systems as minimal as possible is the proper approach. In many such applications latency measured in sub 10 milliseconds is crucial and it takes a radically different approach from traditional ones to achieve a low latency implementation of the popular H.264/MPEG-4 AVC (Part 10) video coding standard.

Latency and zero latency defined
Simply put, video codec latency is defined here as the time lapse between the first pixel of video appearing in the source and the first pixel of decoded video appearing at the destination. Latency-sensitive video applications require that the time lapse between source and decoded video is extremely small. How small depends on the application, but as a guideline, keeping latency down to sub 10ms is a good idea. For convenience we will call such low latency "zero" latency. This is in contrast with the orders of magnitude higher latency found in non latency-sensitive applications.

Figure 1: Latency between source and decoded video

Latency sensitive video codec applications
In video conferencing and video telephony, noticeable delay makes a conversation impossible, unless a "walky-talky" like protocol is strictly followed. This makes the conversation unnatural and cumbersome. In these applications sub 33ms latency for the video codec is required.

Figure 2: Implications of latency in video conferencing

Home networks and other applications
An emerging application with high sensitivity to latency is wireless video networking in the home. This application has recently gained a lot of interest from CE manufacturers and aims to eliminate the HDMI cable between the HDTV set and video source, such as a settop box, DVD player or game box. A similar compelling case exists for the computer industry, where the link between laptop and flat panel monitoris replaced by a wireless connection.

In these applications user interaction with the remote control, game pad, keyboard or mouse, must result in instant screen updates, otherwise the solution is rendered entirely useless. Since transmission at multi-gigabit per second rates over a highly unpredictable RF link is impractical, video compression is required. In these applications sub 10ms latency for the video codec is a critical requirement.

View full size
Figure 3: Implications of latency in wireless video networking applications

Another example with a high emphasis on the importance of low latency is digital video surveillance for mission critical applications. The challenge here is to match the inherently low latency of analog video surveillance systems, as their digital counter parts replace them. In case of securing valuables, such as money in a bank, priceless artifacts in a museum, or merchandise in a store, it is important that the area or building where the intrusion occurs is instantly secured.

In multiple-camera-tracking, video feeds from several cameras are stitched together chronologically into a single feed, which tracks one or more moving objects of interest. Too much latency in the video feeds makes stitching these together a complicated task and renders the application useless for rapid response action. In all these surveillance applications sub 10ms latency for the video codec is a critical requirement.

View full size
Figure 4: Implications of latency in video surveillance applications

Lastly, a less obvious example is in electronic newsgathering or ENG. In these applications cameras in the field capture live action and transmit the video for live broadcast to a nearby satellite uplink truck, where it is edited in real-time prior to up linking. Video feeds from multiple cameras and camera panning/zooming actions need to be interpreted in real-time by the production crew.

Figure 5: Implications of latency in ENG applications

Very low latency in the video feeds is necessary to provide inherent synchronization between all the different video feeds and with panning/zooming actions of the cameras. In this application sub 33ms latency for the video codec is highly desirable.

Unexpected benefits of "zero" latency video codecs
"Zero" latency can drastically simplify systems design in applications where added latency due to other parts in the system, such as transmitters, receivers, video capture an rendering subsystems is negligible. In these cases complicated A/V time stamping and synchronization schemes are not needed as the extremely low latency of the video stream with respect to the audio stream provides inherent synchronization between the two streams. Such extremely low latency A/V systems strongly mimic the way A/V communication occurs in the natural world -- without complicated time stamping and synchronization.

Inside zero latency H.264 video encode-decode processing
In traditional approaches the encoding process starts when a complete frame of video is present, introducing at least 33ms of latency at the encoder and another 33ms at the decoder. In combination with multi-pass motion estimation, multi-pass rate control and frame-based noise filtering, traditional methods of implementation can easily exhibit in excess of 200ms encode-decode latency.

...

Kishan Jainandunsing, PhD, W&W Communications

Saturday, January 15, 2011

DM6467 EDMA3 RM User Guide

http://software-dl.ti.com/dsps/dsps_public_sw/sdo_tii/psp/edma3_lld/edma3-lld-bios6/02_10_02_03/exports/packages/ti/sdo/edma3/rm/docs/EDMA3_RM_User_Guide.pdf

DM6467 RMAN User Guide

http://processors.wiki.ti.com/index.php/Framework_Components_RMAN_Users_Guide

http://focus.ti.com/lit/an/spraai5/spraai5.pdf

http://e2e.ti.com/support/dsp/tms320c6000_high_performance_dsps/f/112/p/36344/126871.aspx

http://processors.wiki.ti.com/index.php/Configuration_of_EDMA3_RM_in_Framework_Components

http://e2e.ti.com/support/embedded/f/355/t/66393.aspx

Friday, January 14, 2011

DM6467 EDMA Programming

Learned from Programming EDMA without EDMA3LLD package

/* EDMA register address and definitions */

#define EDMA_CC_BASE (0x01C00000) /* DM646x. Check address for other devices. */

#define DCHMAP0 *((volatile unsigned int *)(EDMA_CC_BASE + 0x0100))

#define DMAQNUM0 *((volatile unsigned int *)(EDMA_CC_BASE + 0x0240))

#define QUEPRI *((volatile unsigned int *)(EDMA_CC_BASE + 0x0284))

#define EMCR *((volatile unsigned int *)(EDMA_CC_BASE + 0x0308))

#define EMCRH *((volatile unsigned int *)(EDMA_CC_BASE + 0x030C))

#define QEMCR *((volatile unsigned int *)(EDMA_CC_BASE + 0x0314))

#define CCERRCLR *((volatile unsigned int *)(EDMA_CC_BASE + 0x031C))

#define QWMTHRA *((volatile unsigned int *)(EDMA_CC_BASE + 0x0620))

#define ESR *((volatile unsigned int *)(EDMA_CC_BASE + 0x1010))

#define IPR *((volatile unsigned int *)(EDMA_CC_BASE + 0x1068))

#define ICR *((volatile unsigned int *)(EDMA_CC_BASE + 0x1070))

#define PARAMENTRY0 (0x01C04000) /* DM646x. Check address for other devices. */

#define OPT *((volatile unsigned int *)(PARAMENTRY0 + 0x00))

#define SRC *((volatile unsigned int *)(PARAMENTRY0 + 0x04))

#define A_B_CNT *((volatile unsigned int *)(PARAMENTRY0 + 0x08))

#define DST *((volatile unsigned int *)(PARAMENTRY0 + 0x0C))

#define SRC_DST_BIDX *((volatile unsigned int *)(PARAMENTRY0 + 0x10))

#define LINK_BCNTRLD *((volatile unsigned int *)(PARAMENTRY0 + 0x14))

#define SRC_DST_CIDX *((volatile unsigned int *)(PARAMENTRY0 + 0x18))

#define CCNT *((volatile unsigned int *)(PARAMENTRY0 + 0x1C))

/* Allocate srcBuff and dstBuff. Do a cache flush and cache invalidate,if required. */

#define BUFFSIZE 32

int16_t SrcBuf[BUFFSIZE]; // Src of transfer

int16_t DstBuf[BUFFSIZE]; // Dst of transfer

/* Step 1: EDMA initialization */

QUEPRI=0x10;

QWMTHRA =(16<<8u)|(16& 0xFF);

EMCR = 0xFFFFFFFF;

CCERRCLR = 0xFFFFFFFF;

/* Step 2: Programming DMA Channel (and Param set) */

DCHMAP0=0x0;

DMAQNUM0=0x0;

OPT = 0x00100000; /* only TCINTEN is set */

SRC = (unsigned int)SrcBuf;

A_B_CNT = ((1 << 16u) | (BUFFSIZE & 0xFFFFu)); /* ACNT = BUFFSIZE, BCNT = 1 */

DST = (unsigned int)DstBuf;

SRC_DST_BIDX = (BUFFSIZE << 16u) | (BUFFSIZE & 0xFFFFu); /* SRC_BIDX = BUFFSIZE, DST_BIDX = BUFFSIZE */

LINK_BCNTRLD = (1 << 16u) | 0xFFFFu; /* LINK = 0xFFFF, BCNTRLD = 1 */

SRC_DST_CIDX = 0;

CCNT = 1;

/* Step 3: Triggering the Transfer and Waiting for Transfer Completion */

ESR = 0x1;

while(((IPR) & 0x1) == 0);

/* Transfer has completed, clear the status register. */

ICR=0x01;

/* Transfer is complete. Compare the srcBuff and dstBuff *

For more information see the TMS320DM646x DMSoC Enhanced Direct Memory Access (EDMA) Controller User’s Guide http://focus.ti.com/lit/ug/sprueq5b/sprueq5b.pdf