## "Implementation of Polymorphic Matrix Inversion using Viva"

Arvind Sudarsanam and Dasu Aravind

Utah State University

AbstractMatrix inverse operation is an important module in many scientific applications including the solution of a system of linear equations, eigen values computation, and training of a neural network. Many of these applications encompass matrices of varying order and data type (fixed, floating, complex, etc). Introducing a polymorphic feature in the hardware implementation of matrix inverse can have an important benefit, as the design’s shelf-life is increased and the associated design time is reduced. Polymorphism is a well-known concept in software, but is often ignored in hardware design techniques. Existing software solutions fail to capture the massive parallelism that is available in a typical matrix inverse algorithm, in an efficient manner that balances performance with silicon real estate. FPGAs present an ideal implementation platform for matrix inversion due to their inherent parallel architecture and high inter-connectivity. There have been prior efforts towards the hardware implementation of matrix inverse for an FPGA-based system. Most of these implementations focus on a specific type of matrices (triangular/sparse). In the past, researchers have also proposed methods targeting LU decomposition, but none of these approaches are polymorphic in either data type, order of the tensor or information rate. Therefore compelling the designer to restart the design nearly from scratch when data types or order of the matrix changes.

The proposed polymorphic matrix implementation has been implemented in Viva: a hardware description language developed by Starbridge Systems. This paper proposes a modified matrix inverse algorithm using LU decomposition, which supports polymorphism based on the order of the matrix and the data type (signed fixed, unsigned fixed, complex and single precision floating point). The matrix inverse algorithm has been modified to generate a repeating sequence of arithmetic computations operating on a sub-set of elements in the input matrix. This repeating sequence enables us to incorporate order polymorphism using a loop structure implemented in hardware. The number of elements in a given sub-set represents the parallelism supported by the proposed implementation and can be varied to support ‘Information rate polymorphism’. ‘Information rate polymorphism’ modifies the number of pipelining stages and the parallelism in the implementation based on hardware availability. The architecture (circuitry) required to support data polymorphism for signed, unsigned, floating and complex arithmetic is also described in the paper. An analysis of the overhead due to the control unit logic required for the polymorphic implementation over a conventional implementation is provided in terms of the number of slices occupied and efforts were targeted towards minimizing this overhead.