Tuesday, December 30, 2008

XUP and H.264 MP HD Decoder

The following paper presents the architecture, design, validation, and hardware prototyping of the main architectural blocks of main profile H.264/AVC decoder, namely the blocks: inverse transforms and quantization, intra prediction, motion compensation and deblocking filter, for a main profile H.264/AVC decoder. These architectures were designed to reach high throughputs and to be easily integrated with the other H.264/AVC modules. The architectures, all fully H.264/AVC compliant, were completely described in VHDL and further validated through simulations and FPGA prototyping. They were prototyped using a Digilent XUP V2P board, containing a Virtex-II Pro XC2VP30 Xilinx FPGA. The post place-and-route synthesis results indicate that the designed architectures are able to process 114 million samples per second and,  in the worst case, they are able to process 64 HDTV frames (1080x1920) per second, allowing their use in H.264/AVC decoders targeting real time HDTV applications.


Finite-State Machine

Externally, the FSM is defined by its primary inputs, outputs and the clock signal. The clock signal determines when the inputs are sampled and outputs get their new values. Internally, it means, that machine stores a state which is updated at each tick of the clock. There are two major types of FSM.  If the primary outputs depend on the current state only, then it's a Moore machine.

If the primary outputs are a function of both the primary inputs and the current state, it is known as Mealy machine.

This paper gives an example for FSM with VHDL, in which ISE is used for viewing wave forms.

The following gives example for FSM with bluespec.

Wednesday, December 24, 2008

Future HD Video

SAD implementation in FPGA hardware

In this paper, a new unit intended to augment a general-purpose core that is able to perform a  SAD operation was proposed. This SAD implementation can easily be extended to perform the complete SAD operation


other related papers and similar documents are also listed.

Tuesday, December 23, 2008

Useful Bluespec Examples

To store vectors or read in vectors in the testbench, a RegFile can be applied to initialize its contents at start of simulation.

// Copyright 2008 hdfpga.blogspot.com  All rights reserved.
package Tb;

import FIFO ::*;
import Vector ::*;
import FIFOF ::*;
import RegFile::*;

(* synthesize *)
module mkTb (Empty);
  RegFile#(Bit#(9), Bit#(8)) rFile <- mkRegFileLoad("test.dat", 0, 15);
  Reg#(Bit#(9)) cnt <- mkReg(0);
  rule readAndDisp(cnt <>
       $display("#%03d: 0x%02x", cnt, rFile.sub(cnt));
       cnt <= cnt + 1;
  rule finished(cnt == 15);
endmodule: mkTb
endpackage: Tb

Moreover, to feed data from RegFile to an application:

// Copyright 2008 hdfpga.blogspot.com  All rights reserved.
package mkTb;

import FIFO ::*;
import Vector ::*;
import FIFOF ::*;
import RegFile::*;
import Connectable::*;
import GetPut::*;

interface IInputGen;
    interface Get#(Vector#(4, Bit#(8))) ioout;

interface ITestApp;
    interface Put#( Vector#( 4, Bit#(8)) ) ioin;
    interface Get#( Vector#(16, Bit#(8)) ) ioout;

(* synthesize *)
module mkInputGen( IInputGen );

    RegFile#(Bit#(9), Vector#(4, Bit#(8))) rfile <- mkRegFileLoad("test.hex", 0, 4);
    FIFO#(Vector#(4, Bit#(8)))   outfifo <- mkFIFO;
    Reg#(Bit#(9))    index   <- mkReg(0);

    rule output_byte (index <>
       //$display( "inputbyte %x", rfile.sub(index) );
       index <= index+1;

    rule end_of_file (index == 4);
    interface Get ioout = fifoToGet(outfifo);

(* synthesize *)
module mkTestApp( ITestApp );

    RWire#(Vector#(16, Bit#(8))) pix_out    <- mkRWire;
    RWire#(Vector#( 4, Bit#(8))) pix_in     <- mkRWire;

    Reg#(Bit#(9)) cnt <- mkReg(0);
    Reg#(Bit#(4)) step <- mkReg(0);

    rule process_mode( isValid(pix_in.wget()));
     Vector#(4, Bit#(8)) pix = fromMaybe( ?, pix_in.wget() );
 $display("#%03d: 0x%02x", k, pix[k]);

    interface Put ioin;
       method Action put( Vector#(4, Bit#(8)) pix ) if (step <= 3);
     //$display ("%d 0x%02x", cnt, pix[cnt]);

    interface Get ioout;
         method ActionValue#(Vector#(16, Bit#(8))) get() if (isValid(pix_out.wget));
              return fromMaybe(?, pix_out.wget());

(* synthesize *)
module mkTb (Empty);

    IInputGen     inputgen    <- mkInputGen();
    ITestApp     TestApp    <- mkTestApp();

    Reg#(Bit#(8)) x <- mkReg(0);
    Reg#(Bit#(9)) cnt <- mkReg(0);

    mkConnection( inputgen.ioout, TestApp.ioin );

    rule connect;
        ///Vector#(4, Bit#(8)) pix = newVector;
        ///let x <- inputgen.ioout.get();
        //$display ("IO out %0d", x);
        //$display ("IO out 0x%02x", x);
        ///pix = x;
        cnt <= cnt + 1;

  rule finished(cnt == 4);
endmodule: mkTb

endpackage: mkTb

Thursday, December 18, 2008


Bluespec presents the hardware designer an exciting new way to simplify the complexity of constructing control logic while retaining full control over the architecture and performance of the design.

Bluespec’s ESL synthesis toolset for control logic and complex datapath designs significantly accelerates hardware design & reduces verification costs delivering:

  1. Over a 50% reduction in time to a verified design;
  2. Less than 50% of the bugs compared to RTL design;
  3. Design exploration and feature changes can be made correctly and much more quickly


Bluespec learning documentations:


Import C to Bluespec:

Some blogs about Bluespec:

Thursday, December 11, 2008

External Memory Controller for XUP


An implementation of an On Chip Memory (OCM) based Dual Data Rate external memory controller (OCM2DDR) for Virtex II Pro is described. The proposed OCM2DDR controller comprises Data Side OCM (DSOCM) bus interface module, read and write control logic, halt read module and Xilinx DDR controller IP core. The presented design supports 16MB of external DDR memory and 32 to 64 bits data conversion for single read and write operations. The implementation uses 1063 slices of Virtex2Pro FPGA and runs at 100 MHz. The major benets of the proposed design are high bandwidth to external memory with reduced and more predictable access times compared to the Xilinx PLB DDR controller implementation. More specially, the read and write accesses are 2,44 and 4,25 times faster, than the PLB based solution respectively.

Simple Speedups for XUP Board


In this document, a couple of easy tricks to help speed up the numerous multimedia applications that one might find useful to port to the ML-XUP board are described, including 

Sunday, December 7, 2008

Low-cost FPGAs and H.264

Suhel Dhanani from Ocean Logic published a paper "Video encoding with low-cost FPGAs for multi-channel H.264 surveillance":

FPGA-to-ASIC Conversion Flow

Wolfgang Hoeflich from AMI Semiconductor described how a high-definition video scaler ASIC was quickly created using a flexible FPGA-to-ASIC conversion flow. This ensured reproduction of the FPGA functionality and enabled first time fully functional silicon supporting video resolutions up to 1080p.


Synplicity Inc. has released Identify Pro tool allows full visibility into FPGA-based ASIC prototyping in 2007



SimGen is an EDIF/VHDL/FPGA to ASIC Conversion Utility and Simulation Generator for Tanner Tools EDA.


On Semiconductor provide services of FPGA-toASIC


Epson has a FPGA to ASIC conversion


NEC also has a conversion and demonstrated why this conversion was needed


HD Video Encoding with DSP and FPGA

TI proposed to use Digital signal processors (DSPs) handle the vast majority of video encoding applications unaided, and FPGAs as a co-processor to offload certain tasks that satisfy even the most demanding video applications.

HD Video Test Clips and Video Format Conversion

Test clip @ 1280 x 720 (720p) or 1920 x 1080 (1080p) can be downloaded from


The WMV files can be converted to other formats such as YUV using ffmpeg:

The command is
ffmpeg -i test.wmv -s 1920x1088 test.yuv -vframes number

For convertion from yuv422 to yuv420:
ffmpeg -pix_fmt yuv422p -s 704x576 -i 704_576_100f_422.yuv -pix_fmt yuv420p 704_576_100f_420.yuv

If want to convert to avi, use
ffmpeg -s 1920x1088 -i test.yuv -vcodec copy test.avi

If need to convert to a jpeg, for example, yuv2jpeg,
ffmpeg -vframes 1 -i test.264 testvid%d.jpeg

If need to convert from bmp to a yuv 420 / 422, for example, bmp2yuv, or bmp2yuv420, or bmp2yuv422,
ffmpeg -f image2 -s 1600x1200 -vcodec bmp -i 1600x1200.bmp -pix_fmt yuv420p test420.yuv
ffmpeg -f image2 -s 1600x1200 -vcodec bmp -i 1600x1200.bmp -pix_fmt yuv422p test422.yuv

Here lists 1080i video music clips:

Some polpular CIF and QCIF test video sequences are listed in the following website. All the sequences are the 4:2:0 YUV format. And all video sequences are compressed in the 7-Zip format.

Saturday, December 6, 2008

Running Linux on a Xilinx XUP Board


A tutorial for booting a fully functional operating system based on the Linux 2.4 kernel on a Xilinx University Program Virtex II-Pro based development board was presented by John H. Kelm. Furthermore, a reconfigurable hardware accelerator that can be accessed directly by applications or via a character device driver was described.

Crosstool that is a software package created by Dan Kegel that allows x86 Step 11 Linux machines to target the PowerPC405 core of the XUP board was applied.


The MEMOCODE (the ACM-IEEE International Conference on Formal Methods and Models for Co-Design) Contest has been sponsored by Xilinx, Bluespec, CEDA, and Nokia since sometime.

In the MEMOCODE 2007 (the 5th), the basic design challenge was to implement a high-performance Matrix-matrix multiplication (MMM) using any HW and SW design methodology and targeting any FPGA development platform of the contestants’ choice. 


In the MEMOCODE 2008, the hardware accelerated crypto sorter designs were proposed for the MEMOCODE 2008 HW/SW co-design contest. The goal was to sort an encrypted database of records partitioning the problem between a PowerPC processor and the dedicated hardware resources available on a Xilinx Virtex II Pro FPGA. The MIT team won the top honor. The code is in OPENCOREThe documentation can be downloaded from OPENCORE also.

The following link listed some the submission and the corresponding documentations:


Sunday, November 30, 2008

Thursday, November 27, 2008

Using Synplify, ISE/XPS, ActiveHDL and ModelSim

Synplify products (versions 6.2 and higher) have been integrated with Xilinx ISE (Integrated Synthesis Environment)



and XPS (Xilinx Platform Studio)


The following lab guide is for Using Synplify Pro, ISE and ModelSim:


A Xilinx ModelSim Simulation Tutorial from Upenn


An application note presents the design flow for systems created with Xilinx EDK 7.1i with Active-HDL 6.3 SP1 and Xilinx ISE 7.1i:


XUPV2P and 1G DIMM Memory Modules

It seems the board can support up to 2G DIMM Memory Modules. The following link is a datasheet of 1G DIMM Memory Modules


which works on this board.

Sunday, November 23, 2008

ISE FPGA Design Flow Overview

The ISE™ design flow comprises the following steps: design entry, design synthesis, design implementation, and Xilinx® device programming. Design verification, which includes both functional verification and timing verification, takes places at different points during the design flow. This section describes what to do during each step. For additional details on each design step, click a box in the following figure.

Design Entry

Create an ISE project as follows:

  1. Create a project.
  2. Create files and add them to your project, including a user constraints (UCF) file.
  3. Add any existing files to your project.
  4. Assign constraints such as timing constraints, pin assignments, and area constraints.

Functional Verification

You can verify the functionality of your design at different points in the design flow as follows:

  • Before synthesis, run behavioral simulation (also known as RTL simulation).
  • After Translate, run functional simulation (also known as gate-level simulation), using the SIMPRIM library.
  • After device programming, run in-circuit verification.

Design Synthesis

Synthesize your design.

Design Implementation

Implement your design as follows:

  1. Implement your design, which includes the following steps:
    • Translate
    • Map
    • Place and Route
  1. Review reports generated by the Implement Design process, such as the Map Report or Place & Route Report, and change any of the following to improve your design:
    • Process properties
    • Constraints
    • Source files
  2. Synthesize and implement your design again until design requirements are met.

Timing Verification

You can verify the timing of your design at different points in the design flow as follows:

  • Run static timing analysis at the following points in the design flow:
    • After Map
    • After Place & Route
  • Run timing simulation at the following points in the design flow:
    • After Map (for a partial timing analysis of CLB and IOB delays)
    • After Place and Route (for full timing analysis of block and net delays)

Xilinx Device Programming

Program your Xilinx device as follows:

  1. Create a programming file (BIT) to program your FPGA.
  2. Generate a PROM, ACE, or JTAG file for debugging or to download to your device.
  3. Use iMPACT to program the device with a programming cable.


Go to the ISE Quick Start Tutorial to get an idea of the additional capabilities of ISE.


Xilinx EDK Tool Flow

Saturday, November 22, 2008

MIT FPGA Courses

6.375 Complex Digital Systems

A project-oriented course to teach new methodologies for designing multi-million-gate CMOS VLSI chips using high-level synthesis tools in conjunction with standard commercial EDA tools. The emphasis is on modular and robust designs; reusable modules; correctness by construction; architectural exploration; and meeting the area, timing, and power constraints within standard-cell frameworks. 
 6.111 Introduction to Digital Systems

Lectures and labs on digital logic, sequential building blocks, finite-state machines, timing and synchronization, and FPGA-based design prepare students for the design and implementation of a final project of their choice: games, music, digital filters, wireless communications, video, or graphics. Extensive use of Verilog for describing and implementing digital logic designs on a state-of-the-art FPGA. Students engage in extensive written and oral communication exercises. 

The FPGA labkit is a state-of-the-art platform for prototyping digital designs. Based on a 6-million gate platform-scale FPGA, the labkit is designed to facilitate complex and high-performance projects. Several peripheral devices are built into the labkit PCB and hardwired to the FPGA. These include high-speed memory, audio and video encoders and decoders, and other digital interfaces, such as PS/2 and RS-232 ports.


A brief tutorial on the software used to program the labkit's FPGA with Xilinx ISE is also provided @ http://web.mit.edu/6.111/www/s2004/NEWKIT/ise.shtml.


Blog Archive

About Me

My photo
HD Multimedia Technology player