## "Wavelet “Block Processing” for Reduced Memory Transfers"

William F. Turri

^{1}and Eric J. Balster^{2}

^{1}Systran Federal Corp.

^{2}Air Force Research Laboratory/IFTA

AbstractThis paper will describe how an FPGA implementation of the simple the integer Haar wavelet transform was accelerated considerably, using a block-based processing technique to reduce the number of memory accesses (both reads and writes) needed to perform the transform. It will then examine ways to extend this technique to more complex wavelet transforms.

The ProblemWhen compressing a two-dimensional image array using wavelet-based techniques, the wavelet transform is most easily implemented in a “row-then-column” manner. Every row in the image is individually transformed to create an intermediate image; every column in this intermediate image is then transformed in a similar fashion to produce the fully transformed image. While this approach provides great flexibility in allowing the row and column wavelet transforms to be implemented as separate modules, it also reduces performance considerably by introducing many redundant memory accesses. Specifically, every value created by the row-transform module must be stored in an intermediate memory location, only to be read back from memory by the column-transform module. Performance would be enhanced significantly if these row and column transform operations could be combined, thus eliminating the need for the intermediate write/read operations to/from memory.

The SolutionPast work by the primary author demonstrated that the simple Integer Haar wavelet transform, which operates on two pixel values at a time, could be accelerated in hardware by implementing the row and column transform operations using a 2x2 block of pixels. This required an algebraic simplification of the two transform operations. The implementation of this simplified algorithm not only reduced the required memory accesses, but also the gate consumption of the design. In practice, however, the simple Haar transform typically does not provide enough frequency resolution to preserve image quality at higher levels of compression. It is worth investigating a generalized method of creating “block processing” implementations of more complicated wavelet transforms, such as the 5/3 or 2/6. Besides being mathematically more complicated, these wavelet transforms also present challenges along the borders of images, where symmetric extension is required to provide values lying “beyond” the ends of the arrays. An acceptable block processing solution will provide a means of dealing with these extensions in a block (rather than just a row/column) context.

The PaperThe paper will present the results of the study performed by Systran Federal Corp. and AFRL/IFTA. The images being dealt with by Systran Federal Corp. are complex (magnitude/phase) representations of raw Synthetic Aperture Radar (SAR) data, but the technique developed will be applicable two-dimensional data arrays in general. The block-processing algorithm(s) will be simulated using Matlab for ease of testing, knowing that the algorithm(s) can be generalized to VHDL or Verilog with relative ease. If feasible, the algorithm(s) will be implemented in VHDL and tested through ModelSim.