"A High-Performance Radix-2 FFT in ANSI C for RTL Generation"

John Ardini
Draper Laboratory


Powerful high-level language to RTL generators are now emerging. One of the promises of these tools is to allow software and systems engineers to implement algorithms quickly in a familiar language and target the design to a programmable device. The generators available today support syntaxes with varying degrees of fidelity to the original language. This paper focuses on the efficient use of C to RTL generators that have a high degree of fidelity to the original C language. However, coding algorithms without regard for the capabilities of the target programmable logic can lead to low-performance realizations of the algorithm that are several times slower than what could be achieved with a DSP. This paper presents the architecture of a high-performance radix-2 FFT written in ANSI C that is similar in composition to the classic C implementation that is familiar to most engineers. First, methods to organize memory elements and arrays for maximum data accesses per clock cycle are discussed. Next, the exploitation of the natural parallelism of a radix-2 decimation in frequency algorithm is discussed. Finally, the performance improvement by hiding the first and last of the log2(n) butterfly stages is discussed. The resulting RTL outperforms hand optimized DSP assembly code by a factor of two while using less effective area than a DSP solution.


2005 MAPLD International Conference Home Page