"Implementing Scientific Simulation Codes Highly Tailored for Vector Architectures Using Custom Configurable Computing Machines"David Rutishauser

Virginia Polytechnic Institute/NASA Johnson Space Center

AbstractThe motivation for this work comes from an observation that amidst the push for Massively Parallel (MP) solutions to high-end computing problems such as numerical physical simulations, large amounts of legacy code exist that are highly optimized for vector supercomputers. Because re-hosting legacy code often requires a complete re-write of the original code, which can be a very long and expensive effort, this work examines the potential to exploit reconfigurable computing machines in place of a vector supercomputer to implement an essentially unmodified legacy source code. Custom and reconfigurable computing resources could be used to emulate an original application's target platform to the extent required to achieve high performance. To arrive at an architecture that delivers the desired performance subject to limited resources involves solving a multi-variable optimization problem with constraints. Prior research in the area of reconfigurable computing has demonstrated that designing an optimum hardware implementation of a given application under hardware resource constraints is an NP-complete problem. The premise of the approach is that the general issue of applying reconfigurable computing resources to the implementation of an application, maximizing the performance of the computation subject to physical resource constraints, can be made a tractable problem by assuming a computational paradigm, such as vector processing.

This research contributes a formulation of the problem and a methodology to design a reconfigurable vector processing implementation of a given application that satisfies a performance metric. A generic, parametric, architectural framework for vector processing implemented in reconfigurable logic is developed as a target for a scheduling/mapping algorithm that maps an input computation to a given instance of the architecture. This algorithm is integrated with an optimization framework to arrive at a specification of the architecture parameters that attempts to minimize execution time, while staying within resource constraints. The flexibility of using a custom reconfigurable implementation is exploited in a unique manner to leverage the lessons learned in vector supercomputer development. The vector processing framework is tailored to the application, with variable parameters that are fixed in traditional vector processing. Benchmark data that demonstrates the functionality and utility of the approach is presented. The benchmark data includes an identified bottleneck in a real case study example vector code, the NASA Langley Terminal Area Simulation System (TASS) application.