Publications

Detailed Information

Fault Tolerance in Massively Parallel Systems

DC Field Value Language
dc.contributor.authorDeconinck, G.-
dc.contributor.authorVounckx, J.-
dc.contributor.authorCuyvers, R.-
dc.contributor.authorLauwereins, R.-
dc.contributor.authorBieker, B.-
dc.contributor.authorWileke, H.-
dc.contributor.authorMaehle, E.-
dc.contributor.authorHein, A.-
dc.contributor.authorBalbach, F.-
dc.contributor.authorAltmann, Jorn-
dc.contributor.authorDal Cin, M.-
dc.contributor.authorMadeira, H.-
dc.contributor.authorSilva, J.G.-
dc.contributor.authorWagner, R.-
dc.contributor.authorViehover, G.-
dc.date.accessioned2009-08-11T03:20:36Z-
dc.date.available2009-08-11T03:20:36Z-
dc.date.issued1994-12-
dc.identifier.citationTransputer Communications, 2(4), 241-257en
dc.identifier.issn1070-454X-
dc.identifier.urihttps://hdl.handle.net/10371/6897-
dc.description.abstractIn massively parallel systems (MPS), fault tolerance is indispensable to obtain proper completion
of long-running computation-intensive applications . To achieve this at reasonable low cost,
we present a global approach . A flexible and powerful backbone is provided through the combination
ofhardware and software error detection techniques, fault diagnosis and operator-site
software together with reconfiguration of the system. Application recovery is based on checkpointing
and rollback . The red line (i.e. applicability for a massively parallel system) comprises
scalability as well as simplicity. A unifying system model is introduced that allows the mapping
of a global concept for fault tolerance to a wide variety of MPS. The framework for
implementation in an existing MPS is discussed .'
en
dc.description.sponsorshipThis work is partially supported by the EU as ESPRIT project 6731 (ITMPS)en
dc.language.isoen-
dc.publisherJohn Wiley & Sonsen
dc.subjectfault toleranceen
dc.subjectmassively parallel systemen
dc.subjecterror detectionen
dc.subjectfault diagnosisen
dc.subjectbackward error recoveryen
dc.subjectreconfigurationen
dc.titleFault Tolerance in Massively Parallel Systemsen
dc.typeArticleen
Appears in Collections:
Files in This Item:
There are no files associated with this item.

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share