Fault-Tolerant Computing

Nian-Feng Tzeng
Center for Advanced Computer Studies
University of Louisiana at Lafayette



A large parallel system demands high reliability, but the probability of faults occurring in a system grows as its size increases. Reconfiguring a faulty system to get around the faults is a viable approach to reliability enhancement, since the system may continue operation after reconfiguration. Reconfiguration in faulty hypercubes and meshes, as well as recovery point selection, have been pursued. One key issue in parallel systems is the fault-tolerant organization and architecture design of the systems. Various fault-tolerant architectures for parallel machine construction have been considered.


This work was supported in part by National Science Foundation under Grants MIP-8807761, MIP-9201308, and CCR-9300075, and in part by the Board of Regents, State of Louisiana under Contracts No. LEQSF(1992-94)-RD-A-32 and No. LEQSF(1993-95)-RD-A-35.


Publications


Send e-mail to: tzeng@cacs.louisiana.edu