Wednesday, September 28, 2011

NVIDIA reveals a phantom fifth ARM Cortex-A9 processor core in Kal-El

NVIDIA has extended the path to many-core design by publishing a White Paper that reveals the existence of a fifth ARM Cortex-A9 processor core in the company’s previously discussed Kal-El mobile processor. This fifth processor core implements what the company is calling “variable symmetric multiprocessing” (vSMP) and it’s purpose is to provide extremely low-power operation during periods when the end product in an active standby mode—when it’s performing background tasks such as email and social media synchronization or running active widgets. As the NVIDIA White Paper states “Users generally do not care how fast the background tasks are processed, only that they happen and do not consume much battery life.”

Variable SMP technology has several architectural advantages compared to other solutions, such as asynchronous clocking.

Cache Coherency: Since vSMP technology does not allow both the Companion core and the main cores to be enabled at the same time, there are no penalties involved in synchronizing caches between cores running at different frequencies. The Companion and main cores share the same L2 cache, and the cache is programmed to return data in the same number of nanoseconds for both Companion and main cores (essentially it takes more “main core cycles” versus fewer “Companion core cycles” because the main cores run at higher frequency).
OS Efficiency: The Android OS assumes that all available CPU cores are identical with similar performance capability and schedules workloads to these cores accordingly. When multiple CPU cores are each run at different asynchronous frequencies, it results in the cores having differing performance capabilities. This could lead to OS scheduling inefficiencies. In contrast, vSMP technology always maintains all active cores at a similar synchronous operating frequency for optimized OS scheduling. Even when vSMP switches from the Companion core to one or more of the main CPU cores, the CPU management logic ensures a seamless transition that is not perceptible to end users and does not result in any OS scheduling penalties.
Power Optimized: Each core in an asynchronous clocking based CPU architecture is typically on a different power plane (aka voltage rail or voltage plane) to adjust the voltage of each core based on operating frequency. This could result in increased signal and powerline noise across the voltage planes and negatively impact performance. Since each voltage plane may require its own set of voltage regulators, these architectures may not be easily scalable as the number of CPU cores is increased. The additional voltage regulators increase BOM (Bill of Materials) cost and power consumption. If the same voltage rail is used for all cores, then each core will run at the voltage required by the fastest core, thus losing the advantage of the “voltage squared” effect for power reduction.

See more in

http://www.nvidia.com/content/PDF/tegra_white_papers/tegra-whitepaper-0911b.pdf?ClickID=asp5o5tk9ntozolrzlvzk0w0zpprw0rtysks

Processor Wars: NVIDIA reveals a phantom fifth ARM Cortex-A9 processor core in Kal-El mobile processor IC. Guess why it’s there?

Posted on September 20, 2011 by sleibson2

http://eda360insider.wordpress.com/2011/09/20/processor-wars-nvidia-reveals-a-phantom-fifth-arm-cortex-a9-processor-core-in-kal-el-mobile-processor-ic-guess-why-it%E2%80%99s-there/

Monday, September 26, 2011

Comparison between HEVC (H.265) and H.264

With its goal to achieve high coding efficiency, e.g., two times compared with H.264, H.265 features include

2-D non-separable adaptive interpolation filter (AIF)
Separable adaptive interpolation filter
Directional adaptive interpolation filter
"Super macroblock" structure up to 64x64 with additional transforms
Large transform block sizes (up to 32x32)
Adaptive prediction error coding (APEC) in spatial and frequency domain
Adaptive quantization matrix selection (AQMS)
Competition-based scheme for motion vector selection and coding
Mode-dependent KLT for intra coding
Tree-structured prediction and residual difference block segmentation
High-accuracy motion compensation interpolation (8 taps)

http://www.telcogroup.ru/files/materials-pdf/High_Efficiency_Video_Coding_H265.pdf,

and

http://vhampiholi.blogspot.com/2010/03/hngvc-h265.html

it was concluded that the preliminary requirements for HEVC (H.265, or H.NGVC) were bit rate reduction of 50% at the same subjective image quality comparing to H.264/MPEG-4 AVC High profile, with computational complexity ranging from 1/2 to 3 times that of the High profile. HEVC (H.265) would be able to provide 25% bit rate reduction along with 50% reduction in complexity at the same perceived video quality as the High profile, or to provide greater bit rate reduction with somewhat higher complexity.

http://www.ee.ucl.ac.uk/lcs/previous/LCS2010/lens2010_submission_34.pdf

the conclusion showed H.265 can achieve 50% reduction of bit rate at the same PSNR and SSIM as x.264.

Wednesday, September 14, 2011

Nvidia quad-core chip powers Windows 8 demo tablet

Nvidia said its chip is powering the Windows 8 tablet computer that Microsoft showed on stage at its Build conference today. Nvidia created the tablet to demonstrate what a Windows 8 tablet could look like. The device includes a quad-core Tegra processor, code-named Project Kal-El, which is an ARM-based system on a chip. The chip promises PC-like graphics and high energy efficiency. See more in

http://venturebeat.com/2011/09/13/nvidia-quad-core-chip-powers-windows-8-demo-tablet/

Saturday, September 10, 2011

H.265 Motion Estimation on FPGA

MIT students Mehul Tikekar and Mahmut E. Sinangil, mentored by Alfred Man Cheuk Ng., have developed H.265 Motion Estimation that can sustain at least 30 frames per second (fps) for 1,280 x 720-frame resolution. The project produced a design that sustains 10 fps at 50 MHz on FPGA and 40 fps at 200 MHz when synthesized with a 65-nm cell library. Motion estimation is an essential component of any digital video encoding scheme. H.265, the next-generation standard in development to follow H.264, allows variable-size coding units to increase coding efficiency. They took the BSV test lab representations and synthesized them into corresponding Verilog RTL representations. The Verilog is then synthesized into an equivalent gate-level representation that is loaded into the FPGA development board.

See detail in http://www.bluespec.com/downloads/XCELL_76-Xpert_opinion.pdf.

The report was published in http://csg.csail.mit.edu/6.375/6_375_2011_www/projects.html.

Thursday, September 8, 2011

Vanguard Software Solutions Demonstrates H.265/HEVC CODEC

Sep 8, 2011 12:17:00 AM

2011AMSTERDAM--(BUSINESS WIRE)--Vanguard Software Solutions (VSS), a leader in H.264/AVC CODEC technology since 2004, has joined the development efforts of the emerging video CODEC standard, likely to succeed H.264/AVC. This new standard, named High Efficiency Video Coding (H.265/HEVC) is being developed by the Joint Collaborative Team on Video Coding (JCT-VC) of ISO/IEC MPEG and ITU-T VCEG. According to some sources, the final draft of H.265/HEVC standard is planned for February 2012.

At IBC2011 conference, VSS will be demonstrating the results of H.265/HEVC Encoder and Decoder optimization based on latest draft of HEVC standard. This demonstration will show significant improvement in performance for Broadcast Distribution market over current de facto standard of H.264/AVC. The value of HEVC is to encode video at lower bitrates then H.264/AVC encodes of same quality. This value is afforded to HEVC through its higher complexity as compared to H.264/AVC. This complexity is well balanced with the general technology trend of increasing computing power in PC CPU, FPGA and ASIC based video processing.

VSS is well positioned to be a leading supplier of HEVC CODEC on multiple platforms. Having many years of CODEC experience and being the first to demonstrate H.264 real-time CODEC in 2004, VSS will be first to show a commercial version of HEVC software CODEC. Following PC software HEVC CODEC, VSS will be introducing commercial implementations on hardware platforms.

An Excellent Video Presentation: The Language of Concurrency

Most media SOCs have two or multi-cores. Concurrence, multi-core and multi-processors, are getting more and more important in media processors.

The Language of Concurrency, a one-hour introductory webinar that provides a broad overview of concurrency, explaining the meaning and importance of various terms, see

http://www.corensic.com/Community/Resources/TheLanguageOfConcurrencyVideo.aspx

Saturday, September 3, 2011

How Qualcomm’s Snapdragon ARM chips are unique

By Ryan Whitwam on August 26, 2011 at 9:00 am

http://www.extremetech.com/mobile/94064-how-qualcomms-snapdragon-arm-chips-are-unique

There is a reason that so many mobile devices run on Qualcomm’s Snapdragon system-on-a-chip (SoC). Qualcomm is one of the largest designers of mobile ARM chips in the world, but it’s not just the scale that has made Qualcomm into a mobile powerhouse. The design and features of the Snapdragon SoC have proven to be a hit with users and device makers alike.

This esoteric bit of silicon might seem inconsequential, but it has a huge impact on the design and capabilities of a phone or tablet. Qualcomm has long prided itself on going its own way, and that’s evident in the design of the Snapdragon line of parts. Whereas chip designers like Samsung and Texas Instruments (TI) license the architecture for ARM’s Cortex cores, Qualcomm designed their own ARM-compatible cores.

In current generation SoCs, Qualcomm uses the Scorpion core instead of Cortex-A8. They license the ARM instruction set, so the chips remain compatible at the user level, but running the enhanced Scorpion core means more bang-for-the-buck when actually using a phone.

When it comes to that slab of glass and plastic that lives in pockets, it needs to be slim and well-designed. Qualcomm’s Snapdragon makes that easier from the perspective of the OEM. All SoCs integrate several system components into one package, but Qualcomm has taken this to the logical extreme. All generations of Snapdragon SoC have the processor, GPU, GPS, and most importantly, the GSM/CDMA cellular modem all in one package. This saves space and power in the phone. Designing a svelte, attractive device becomes easier when more components are in one piece of silicon. Similarly, the supply chain is simplified for the OEM if they do not need to source parts for as many individual components.

As Qualcomm moves forward, they aren’t done innovating. The new dual-core Snapdragons are beginning to make their way onto the market in devices like the HTC Sensation and Evo 3D. Unlike competing the dual-core chips from Nvidia and TI, the Snapdragon with its custom Scorpion cores is capable of asymmetric use. This essentially means the cores can be clocked independently and have different power draws. Users will see better power management from these chips, even in a dual-core world.

Since Qualcomm’s dual-core SoCs are still using Scorpion, they are reaching the limits of the architecture. Scorpion was designed to emulate last year’s ARM Cortex-A8. Chips like the Nvidia Tegra 2 and Samsung Exynos license Cortex-A9, which is a generation newer.

The big change is set to come in the fourth generation Snapdragons with the introduction of the Krait core. Krait is expected to be paired with a new generation of Adreno mobile graphics and use a much more advanced manufacturing process. The upshot for users being that Qualcomm’s new chips are likely going to be blisteringly fast. According to Qualcomm, the power consumption of these faster SoCs will be even better than Scorpion-based units. That’s a big deal for users that need an all-day device.

The new dual and quad-core Snapdragons running Krait cores are expected to begin showing up late in 2011, and into 2012. Bringing together these new features with the innovative SoC design seen in recent years, Qualcomm’s Snapdragon chips could be headed for continued dominance in mobile devices.

HDfpga