Publications
Detailed Information
An Approach for Hierachical System Level Diagnosis of Massively Parallel Computers Combined with a Simulation-based Method for Dependability Analysis
Cited 0 time in
Web of Science
Cited 0 time in Scopus
- Authors
- Issue Date
- 1994-10
- Publisher
- EDCC1994
- Citation
- IEEE EDCC1994, 1st European Dependable Computing Conference, pp. 371-385, Berlin, Germany, October 1994
- Keywords
- massively parallel computers ; system level diagnosis ; simulation-based analysis ; scalable ; object-oriented simulation models
- Abstract
- The primary focus in the analysis of massively parallel supercomputers has
traditionally been on their performance. However, their complex network topologies,
large number of processors, and sophisticated system software can make them very
unreliable. If every failure of one of the many components of a massively parallel
computer could shut down the machine, the machine would be useless. Therefore fault
tolerance is required. The basis of effective m~hanisms for fault tolerance is an efficient
diagnosis.
This paper deals with concurrent and hierarchical system level diagnosis for a particular
massively parallel architecture and with a sinaulation-based method to validate the
proposed diagnosis algorithm. The diagnosis algorithm is presented and we describe
a simulation-based method to test and verify the algorithms for fault tolerance already
during the design phase of the target machine.
- Language
- English
- Files in This Item:
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.