Publications

Detailed Information

On Integrating Error Detection into a Fault Diagnosis Algorithm for Massively Parallel Computers

DC Field Value Language
dc.contributor.authorAltmann, Jorn-
dc.contributor.authorBartha, Tamas-
dc.contributor.authorPataricza, Andras-
dc.date.accessioned2009-08-11-
dc.date.available2009-08-11-
dc.date.issued1995-04-
dc.identifier.citationIPDS1995, 1st International Computer Performance and Dependability Symposium, pp. 154-164, Erlangen, Germany, April 1995en
dc.identifier.urihttps://hdl.handle.net/10371/6875-
dc.description.abstractScalable fault diagnosis is necessary for constructing
fault tolerance mechanisms in large massively parallel multiprocessor
systems. The diagnosis algorithm must operate
efficiently even if the system consists of several thousand
processors. In this paper we introduce an event-driven, distributed
system-level diagnosis algorithm. It uses a small
number of messages and is based on a general diagnosis
model without the limitation of the number of simultaneously
existing faults (an important requirement for massively
parallel computers). The algorithm integrates both error
detection techniques like messages, and built
in hardware mechanisms. The structure of the implemented
algorithm is presented, and the essential program modules
are described. The paper also discusses the use of test results
generated by error detection mechanisms for fault localization.
Measurement results illustrate the effect of the
diagnosis algorithm, in particular the error detection mechanism
by messages, on the application performance.
en
dc.description.sponsorshipSupported by the EU (European Unit) as part of the Esprit Project 6731,
Fault Tolerance for Massively Parallel Systems, and the Hungarian-German
Joint Scientific Research Project #70 with additional support from
OTKA-F007414.
en
dc.language.isoenen
dc.publisherIPDS1995en
dc.subjectError detection,en
dc.subjectdistributed diagnosis,en
dc.subjectsyndrome decoding,en
dc.subjectmassively parallel systemsen
dc.titleOn Integrating Error Detection into a Fault Diagnosis Algorithm for Massively Parallel Computersen
dc.typeConference Paperen
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share