"A Study of the Impact of Temperature on FPGA-based TMR Designs"

Amr A. Ahmadain and Karen A. Tomko
University of Cincinnati

Abstract

TMR-based systems are one of the most cost-effective fault tolerance methods for mission-critical systems but at the same time it comes at the cost of increased power consumption and temperature levels. Temperature is one of the most critical factors that could potentially lead to failures in electronic systems. Steady-state temperature is the most common method of testing integrated circuits although it is not the only type of stress that electronic devices are exposed to during their operational lifetime. Temperature cycles, temperature gradients and even random changes in temperature all have the capacity to affect the reliability of integrated circuits and electronic devices.

In this study, we argue that using steady-state temperature as the only stress factor rather than using more realistic temperature-lifetime stress relationships could easily lead to pessimistic results and hence to overly-conservative decisions. We relax the assumption of a constant failure rate by using an inhomogeneous Markov chain. We explore the preliminary relationship between TMR-based designs, different temperature-lifetime models and the overall impact on system reliability.

We will show through Markov-based modeling that using steady-state temperature as the only method of stress testing could result in up to a factor of 8 difference in predicting when the system reliability drops to zero or in other words when the system will fail. We conclude by summarizing our key results and providing some insight into potential future work.

 

2005 MAPLD International Conference Home Page