"High-Level Implementation of VSIPL on FPGA-based Reconfigurable Computers"
Malachy Devlin1, Robin Bruce2, and Stephen Marshall3
2Institute of System Level Integration, Alba Centre
FPGA technology continues an aggressive trend of increasing capacity year on year with the latest devices offering dedicated silicon for accelerating mathematical operations. The latest families offer up to 512 dedicated MAC (Multiple and Accumulate) Units providing a performance capability of 256GMACs/s on 18 bit data. This is further supported by a programmable-logic fabric to offer algorithm dependent implementations. Typically, FPGAs are considered only for integer or bit-level algorithm implementation to achieve high-performance computing capability in preference to microprocessors. There is a perception that microprocessors are still required to implement floating-point algorithms and that FPGAs are weak in this area. This is not the case. FPGAs such as Xilinx’s V-II Pro are capable of offering over 25GFLOPs of single-precision peak performance, with approximately a quarter of this performance possible for double-precision floating point. Further to being a contender to microprocessors for integer and bit-level algorithms, FPGAs are also a viable alternative to microprocessors when implementing floating-point algorithms. VSIPL, Vector Signal and Image Processing Library, is an open standard API for highly efficient and portable computational middleware for signal and image-processing applications. This library has typically been targeted at microprocessors and is provided as a C or C++ API. This library is aimed at high-performance applications and has capabilities of handling a range of data types, including integer and floating point.
The work we are looking at in this paper is the implementation of these libraries on FPGAs, to provide to the FPGA developer the same level of middleware functionality as has traditionally had the software developer had with the original APIs. We will describe the implementation detailing the floating-point functions using a high-level language so that a high-level C-based interface is provided to the FPGA developer. Optionally an API would be available to the application software running on a host computer.
Since the implementation of algorithms will utilise a number of these functions this will consume silicon on the FPGA device. For example if two instantiations of a function is implemented, then this will take up twice as much of the silicon real estate on the device. Eventually on large algorithms a single device will not suffice for the algorithm implementation. We will therefore also be demonstrating the partitioning and system control on multi-FPGA systems of these high capacity algorithms.
2005 MAPLD International Conference Home Page