"On Breaking the Limits of Scalability with Current-Generation Reconfigurable Computing Platforms: Simulation Experiments with Matrix-oriented Processing Kernels"

Sreesa Akella and James P. Davis
University of South Carolina

Abstract

In this paper, we discuss our experience with the architectures of current-generation reconfigurable custom computing platforms in the development of scientific computing kernels. Specifically, we have been investigating attainable speedup and scalability of computing performance on reconfigurable computing platforms from several vendors.  Whereas we do not “benchmark” these platforms per se (we are unable to present comparative results across platforms due to licensing agreements), we do discuss limitations of these architectures to develop SIMD-style of highly parallel computing kernels for Sparse Matrix Vector Multiplication (SMVM), Matrix Transposition (MT), and Unweighted Pair Group Method with Arithmetic Mean (UPGMA) applications. We discuss our results with processing kernels for matrix-oriented data structures on several such platforms, and show how the continued adherence to traditional Von Neumann computing architectures, particularly in terms of processor-memory models, limits the degree of speedup and scalability that is achievable. Because of current reconfigurable computing platform architectures, we are limited in terms of the degree of speedup (using SMP and MPI as baseline models) and scalability (in terms of extending the size of the computable problem spaces by orders of magnitude). One of the outcomes of our study is a definition of a “conceptual” architecture for highly parallelizable computing in these domains, where we have subjected our architecture model to simulation studies where we scale the number of PEs, MEs and associated interconnect to 32K and 64K PEs involved in large-scale matrix computations—much larger than is possible on the reconfigurable platforms in use today.  The outcome of our study is defining architectures for reconfigurable computing fabrics that are of a scale comparable to the largest SMP arrays. We demonstrate how speedup and scalability can be achieved, and discuss the ramifications of our architecture to FPGA fabrics, reconfigurable FPGA board arrays, and massively parallel nanocomputing fabrics on molecular substrates.

Acknowledgement

The work has been supported by a research grant from the DoD.

2006 MAPLD International Conference Home Page