"FPGA Implementation of Systolic Array Levenshtein Matrix Distance Calculator for Reverse Compliment DNA Code Design"

Daniel J. Burns1, Virginia W. Ross1, Qinru Qiu2, and Qing Wu2

1Air Force Research Laboratory
2Binghamton University

Abstract

This paper describes the design, synthesis, and performance of an application that uses a single FPGA to achieve a 670X speed-up over a single PC C language program for a highly constrained code generation problem that generates libraries of minimally interacting DNA code word pairs. Such libraries are useful in biological assay chips, schemes for DNA assisted nano- self assembly, and information processing methods that store and manipulate data by means of biomolecules. There are no known construction methods for optimal codes of this sort, and the best known upper bounds on the sizes of such codes are generally found by experiment. One difficulty composing these DNA codes is the extremely large search space of possible code word candidates, (e.g. 264 or 1.9e9 for length 16 amino acid base codes, or 1.8e19 for the case of 32 base codes. We report the first use of a hardware genetic algorithm to guide this search, an improvement over software GA and 'Stochastic' methods found in the literature. Another difficulty is calculating an exponentially growing number of binding affinities between candidate code words and their reverse compliments (RC's) against code words and their RC's in the library, as the library grows. For this we use a hardware systolic array that provides pair-wise edit distances (a measure correlated to binding affinity) at the PE clock rate, an improvement over distributed Island Model GA methods run on a cluster. We also discuss problems encountered in design and synthesis that relate to Host/PE communication, and an early attempt at a mixed mode (clocked and behavioral) implementation. Finally, we discuss follow-on improvements such as an optional pseudo-exhaustive search mode for extending large libraries, scaling to longer word lengths, and we propose a new single chip hardware multi-population GA for additional speed-up that is targeted for a cluster of FPGA's approach expected to yield 500,000X speed-ups on this and possibly other similar problem types.

2006 MAPLD International Conference Home Page