Wednesday, March 18, 2009

Xilinx Virtex-5 family Overview

The new Virtex-5 devices are the world's first FPGAs to be fabricated at the 65 nm technology node. These devices are based on a low-K dielectric process that reduces parasitic capacitance, enables faster switching speeds, and reduces heat dissipation. Their 12-layer metallization (11 copper layers and 1 aluminum layer) supports an advanced diagonal interconnect fabric. Their core voltage has been reduced to 1.0V, thereby reducing dynamic power consumption (the core voltage of Virtex-4 devices is 1.2V). Meanwhile, their second-generation triple-oxide technology dramatically reduces static power dissipation.

Overall, the new Virtex-5 family provides 65% more logic cells and 25% more input/outputs (I/Os) as compared to the preceding Virtex-4 generation of devices. At the same time, members of the Virtex-5 family are claimed to provide 30% higher performance, 35% lower dynamic power dissipation, and they consume 45% less silicon real estate as compared to their Virtex-4 counterparts.

All members of the Virtex-5 family are based on Xilinx's ASMBL (Advanced Silicon Modular Block) architecture. For each application domain – such as digital signal processing – Xilinx has determine the optimum mixture (ratio) of logic, memory, DSP slices, and so forth. Next, for each application domain, Xilinx create a suite of components, all based on the same "mix" but with a range of capacities. This suite is collectively referred to as a "platform". Based on this, Xilinx have announced four domain-optimized platforms as follows:

  • Virtex-5 LX: High performance logic (shipping now).
  • Virtex-5 LXT: High performance logic with serial connectivity (coming in the second half of 2006).
  • Virtex-5 SXT: High performance DSP with serial connectivity (coming in the second half of 2006).
  • Virtex-5 FXT: Embedded processing with serial connectivity (coming in the first half of 2007).

Virtex-5 devices are also based on Xilinx's new ExpressFabric technology, which features LUTs with 6 independent inputs for fewer logic levels, and a new diagonal interconnect architecture that facilitates shorter, faster routing. An overview of some of the more significant Virtex-5 architectural features are as follows:

6-Input lookup tables (LUTs)
The first FPGA to be presented to the market in 1985 – the XC2064 from Xilinx – contained 64 configurable logic blocks (CLBs), each of which boasted two 3-input lookup tables (LUTS). Subsequent generations moved to 4-input LUTs, because these offered a more optimal balance with regard to logic utilization and minimizing the number of logic levels in the context of designs of that era.

However, there has been a fundamental shift in the nature of designs over recent years. Today's designs often feature wide data paths, especially in the case of digital signal processing (DSP) applications. Implementing these designs using 4-input LUTs can require many levels of logic, thereby impacting performance. In order to address this issue, the ExpressFabric employed by the Virtex-5 family feature's LUTs with six independent inputs, which can significantly reduce the number of logic levels required to implement wide functions (Fig 1).

1. The Virtex-5 family features 6-input LUTs.

Each of these logical elements can be used as a true 6-input LUT or as two 5-input LUTs that share five of their inputs. In addition to containing four 6-input LUTs, a Virtex-5 slice also includes faster flip-flops to speed pipelined designs and an improved carry chain architecture to speed arithmetic operations. Overall, Virtex-5 family provides 65% more logic cells (330,000 LCs) as compared to their Virtex-4 counterparts.

Diagonal interconnect
The traditional way of implementing FPGA interconnect results in a complex pattern as to which CLBs can be reached from an initial CLB in 1, 2, 3, or more hops. Consider the central (red) CLB shown in Fig 2 for example; from this starting point, the CLBs in yellow can be reached in 1 hop, the CLBs in green can be reached in 2 hops, and the CLBs in blue can be reached in three hops.

2. Traditional Virtex-4 interconnect pattern.

Reaching CLBs outside the blue area will require more hops. Note especially the "holes" in the blue areas; reaching CLBs in these holes will also require more hops. This complex arrangement impacts speed and increases the complexity of synthesis, place, and route. In irder to address this issue, another ExpressFabric feature is a radically new form of diagonal interconnect that reaches more locations with fewer hops (Fig 3). This diagonally symmetric interconnect pattern is intended to improve both speed and predictability.

3. Diagonally symmetric Virtex-5 interconnect pattern.

The combination of the ExpressFabric's 6-input LUTs and diagonally symmetric interconnect pattern results in an average increase of logic performance of 30% over the previous Virtex-4 generation of devices, which equates to up to two speed-grades. 

Faster, higher-capacity RAM blocks with hard IP
The new block RAM Structures (with pipeline) featured in the Virtex-5 family have been increased to 32 Kbits in size, which is twice the size of those found in Virtex-4 components. In addition to offering a simple dual-port mode that can double the block RAM's bandwidth, these also contain additional hard IP in the form of FIFO logic and new 64-bit error checking and correction (ECC) logic (Fig 4). Implementing this logic as hard IP frees up other resources and minimizes dynamic power consumption.

4. The Virtex-5 family features up to 10 Mbits of 550 MHz block RAM.

As with all of the hard IP blocks in Virtex-5 devices, these block RAMs have been tuned for 550 MHz operation to provide higher on-chip memory bandwidth. If required, unused 18 Kbit sub-blocks can be turned off so as to minimize power consumption.

Faster, wider DSP functions
The Virtex-5 hard DSP slice – called the DSP48E – features a 25 × 18 bit multiplier (versus the 18 × 18 multiplier employed in Virtex-4 FPGAs). The increase to a 25 × 18 bit multiplier can lead to fewer cascaded stages, thereby resulting in higher overall performance and utilization (Fig 5).

4. The Virtex-5 family features DSP slices with 25 × 18 multipliers.
(This illustration is a simplification of the DSP48E's functionality)

Tuned for 550 MHz operation, these high-precision, high-performance, highly flexible slices can be configured for DSP, arithmetic, and logic functions, and they can also be cascaded for adder chain architectures. (Observe that the multiplier is followed by a 3-input 48-bit "adder" that also perform logical operations such as AND, OR, XOR, etc.) The DSP48E Slice has 40% lower power consumption as compared to implementing equivalent functions in Virtex-4 FPGAs (1.38mW/100 MHz at a 38 percent toggle rate).

Advanced clock management
Virtex-5 devices offer up to 18 clock elements for flexibility and differential global clocking for low skew and jitter. The new 550 MHz clock management tile (CMT) in the Virtex-5 features two digital clock managers (DCMs) and a phase-locked loop (PLL). The DCMs provide precise phase control for better design margin, while the PLL reduces reference clock jitter by more than two times. Depending on each design's unique requirements, the PLL can be configured before or after the DCMs.

High-speed I/O and new packaging technology
Virtex-5 devices offer up to 1,200 general-purpose input/output (GPIO) pins, which is around 25% more than the previous Virtex-4 generation of components. These pins provide 1.25 Gbps differential I/O and 800 Mbps single-ended I/O. Additionally, Virtex-5 devices feature second-generation ChipSync source-synchronous technology, which allows programmable delays to be applied to both inputs and outputs (the previous generation of ChipSync supported delays only on input pins).

As opposed to the highest pin-count Virtex-4 devices, whose 960 pins were presented as 15 banks of 64 pins, the 1,200 pins in Virtex-5 components are presented as 30 banks of 40 pins. These smaller banks offer more flexibility in terms of I/O placement. Furthermore, Virtex-5 packages employ a new second-generation Sparse Chevron packaging technology, which is claimed to minimize signal integrity (SI) issues such as cross-talk and also to ease the task of PCB layout.

Pricing and availability
SynplicityMentor Graphics, and Magma Design Automation all support the Virtex-5 design flow. Early access software for Virtex-5 FPGAs is available now, with general availability in June 2006.

Virtex-5 LX FPGA engineering samples are shipping now in the LX50, LX85 and LX110 densities, with the LX30, LX220, and LX330 to follow over the next six months. At customer production timeframes in 2008, the LX50 device will list for $149, the LX85 device will list for $279, and the LX110 device will list at $399, all in 1,000 unit volumes. These price points represent savings of more than 50 percent over offerings of other competitive 90-nm FPGAs.

For even further cost reductions, the Virtex-5 EasyPath program will be available at time of volume production beginning in 2007. For more information, visit the Xilinx website at

17,280 slices
129 BRAMs
64 DSP

LUT 69120
FF676, FF1153, FF1760

No comments:


Blog Archive

About Me

My photo
HD Multimedia Technology player