"FPGA Acceleration of the LINPACK Benchmark Using Handel-C and the Celoxica Floating Point Library"

Kieron Turkington, Konstantinos Masselos, George A. Constantinides, and Philip Leong
Imperial College London


Due to their increasing resource densities, field programmable gate arrays (FPGAs) have become capable of efficiently implementing large scale scientific applications involving both single and double precision floating point computations. However, for general scientific computing on FPGAs to become feasible, it must be possible to develop efficient applications at higher levels of abstraction, above that available through VHDL and Verilog. In this paper the latest FPGAs (Altera Stratix II and Xilinx Virtex 4) are compared to a high end microprocessor (3GHz Intel Pentium 4) with respect to sustained performance for a popular floating point CPU performance benchmark, namely LINPACK 1000. The FPGA hardware has been developed in Handel-C and uses the Celoxica Floating Point Library functions to perform the arithmetic. A set of translation and optimization steps have been applied to transform a sequential C description of the LINPACK benchmark, based on a monolithic memory model, into a parallel Handel-C description that utilizes the plurality of memory resources available on a realistic reconfigurable computing platform. These transformations allow high levels of parallelism to be exploited without increasing the external memory bandwidth required. The experimental results show that the latest generation of FPGAs, programmed using Handel-C, can achieve a sustained floating point performance up to 10 times greater than the microprocessor while operating at a clock frequency that is 30 times lower.


2006 MAPLD International Conference Home Page