Publications

Detailed Information

An Approach for Hierachical System Level Diagnosis of Massively Parallel Computers Combined with a Simulation-based Method for Dependability Analysis

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors

Altmann, Jorn; Balbach, F.; Hein, A.

Issue Date
1994-10
Publisher
EDCC1994
Citation
IEEE EDCC1994, 1st European Dependable Computing Conference, pp. 371-385, Berlin, Germany, October 1994
Keywords
massively parallel computerssystem level diagnosissimulation-based analysisscalableobject-oriented simulation models
Abstract
The primary focus in the analysis of massively parallel supercomputers has
traditionally been on their performance. However, their complex network topologies,
large number of processors, and sophisticated system software can make them very
unreliable. If every failure of one of the many components of a massively parallel
computer could shut down the machine, the machine would be useless. Therefore fault
tolerance is required. The basis of effective m~hanisms for fault tolerance is an efficient
diagnosis.
This paper deals with concurrent and hierarchical system level diagnosis for a particular
massively parallel architecture and with a sinaulation-based method to validate the
proposed diagnosis algorithm. The diagnosis algorithm is presented and we describe
a simulation-based method to test and verify the algorithms for fault tolerance already
during the design phase of the target machine.
Language
English
URI
https://hdl.handle.net/10371/6891
Files in This Item:
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share