## "Hardware Implementation of 2-D Wavelet Transforms in Viva on the Starbridge Hypercomputer"

Sitanshu Gakkhar

Utah State University

Abstract

While Wavelet Transforms are highly useful in image processing due to the fact that they provide not only the frequency characteristics of the image but also its spatial characteristics, they are computationally intensive and as such their real time applications where such image processing capability would be extremely useful, for instance, analysis and compression of aerial images and motion detection are limited. The design proposed here outlines an architecture that exploits the inherent parallelism in the Wavelet Transform algorithm and uses this along with elements of superscalar pipelining to enable a highly efficient implementation of the Wavelet Transform algorithm. Essentially the Wavelet Transform algorithm consists of two convolution stages where the second stage convolution is performed on the results of the previous convolution. The bottleneck is that the hardware ends up accessing memory for reads twice. The design suggested here pipelines the two convolution stages, where in the first convolution stage computes only enough data for the second stage convolution. Once this data is being processed for the second stage of filtering, the first stage convolution generates the next set of data for second stage convolution. As such at end of each processing cycle (assuming filter sizes of n and m, m being the larger filter) final values for m pixels are generated. For filter sizes of 4 each, 4 pixels for the final doubly filtered image are generated at each clock cycle once the pipeline has been setup. This evaluates to 15 addition operations and 20 multiplication operations each clock cycle. Thus a 512 X 512 image can be completely processed in 128 X 512 (65536) clock cycles (excluding a single digit number for setting up the pipeline). The same processing on a microprocessor based device takes approximately 35389440 cycles. Furthermore the memory module is split up into multiple DDR SDRAM chips to support a extremely high data bandwidth capable of enough data throughput at each cycle to be able to have multiple (2 4) of these data pipelined data processing modules processing different sections of the same image at the same time at full capacity. This again cuts down on processing time by a factor of 2 to 4. Besides the massive speedup obtained the hardware also has native support for handling edge conditions. As such based on the expandability of the design and its comparative performance with respect to other image processing modules, this design is ideal for real time implementation of Wavelet transforms. The design has been carried out in Viva, an information rate and data rate polymorphic hardware description language and executed on Virtex II 6000 FPGA devices of the Starbridge Hypercomputer.