"Design and Tradeoff Analysis of JPEG2000 on Hardware-Reconfigurable Systems"
R. DeVille, V. Aggarwal, I. Troxel, and A. George
University of Florida
JPEG2000 is a relatively new image coding standard that uses state-of-the-art compression techniques based on wavelet technology. Due to the standardís inherent flexibility, JPEG2000 lends itself to a variety of military and aerospace applications and beyond, from satellite and medical imaging to digital cameras, but it can be complex and demanding upon computational resources. While several initial software solutions have been developed, they lack the level of performance that high-end imaging applications demand. Custom VLSI solutions have also been developed, but these can be cost-prohibitive to employ. In recent years, researchers have achieved success in porting various image-processing applications to reconfigurable computing (RC) systems, where it is possible to provide efficient bit-level manipulation and true hardware parallelism without incurring the high costs of ASICs. Hence, motivation exists for an RC design and implementation of this computation-intensive encoding and decoding standard.
This presentation will showcase the acceleration of JPEG2000 encoding by identifying the most computationally demanding parts of the algorithm and targeting them onto the FPGA, with less critical parts provided by conventional processing as needed in a dual-paradigm computational setting. Our approach with the FPGA focuses on the lossless encoder (which involves integer operations) from the gamut of options in the standard, since all-integer arithmetic is more amenable to an FPGA implementation. Results and analyses with JPEG2000 will be provided by means of designs and experiments on two different RC platforms, an SGI Altix 350 RASC system featuring one Xilinx Virtex-II device and two 64-bit microprocessors connected to one another and main memory via its high-speed NUMAlink interconnect, and a dual-processor Linux server equipped with a Nallatech BenNUEY card in a PCI slot coupled with a BenBLUE-II daughter card together featuring three Virtex-II devices. Results from these two platforms will help identify and illustrate tradeoffs in mapping of this application to two different styles of RC-enhanced system architecture. In addition to contrasting the design for each of two system architectures, we will also highlight the performance improvement obtained when compared to the software versions running completely on the host.
2005 MAPLD International Conference Home Page