Publications
Detailed Information
Fault Tolerance in Massively Parallel Systems
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Deconinck, G. | - |
dc.contributor.author | Vounckx, J. | - |
dc.contributor.author | Cuyvers, R. | - |
dc.contributor.author | Lauwereins, R. | - |
dc.contributor.author | Bieker, B. | - |
dc.contributor.author | Wileke, H. | - |
dc.contributor.author | Maehle, E. | - |
dc.contributor.author | Hein, A. | - |
dc.contributor.author | Balbach, F. | - |
dc.contributor.author | Altmann, Jorn | - |
dc.contributor.author | Dal Cin, M. | - |
dc.contributor.author | Madeira, H. | - |
dc.contributor.author | Silva, J.G. | - |
dc.contributor.author | Wagner, R. | - |
dc.contributor.author | Viehover, G. | - |
dc.date.accessioned | 2009-08-11T03:20:36Z | - |
dc.date.available | 2009-08-11T03:20:36Z | - |
dc.date.issued | 1994-12 | - |
dc.identifier.citation | Transputer Communications, 2(4), 241-257 | en |
dc.identifier.issn | 1070-454X | - |
dc.identifier.uri | https://hdl.handle.net/10371/6897 | - |
dc.description.abstract | In massively parallel systems (MPS), fault tolerance is indispensable to obtain proper completion
of long-running computation-intensive applications . To achieve this at reasonable low cost, we present a global approach . A flexible and powerful backbone is provided through the combination ofhardware and software error detection techniques, fault diagnosis and operator-site software together with reconfiguration of the system. Application recovery is based on checkpointing and rollback . The red line (i.e. applicability for a massively parallel system) comprises scalability as well as simplicity. A unifying system model is introduced that allows the mapping of a global concept for fault tolerance to a wide variety of MPS. The framework for implementation in an existing MPS is discussed .' | en |
dc.description.sponsorship | This work is partially supported by the EU as ESPRIT project 6731 (ITMPS) | en |
dc.language.iso | en | - |
dc.publisher | John Wiley & Sons | en |
dc.subject | fault tolerance | en |
dc.subject | massively parallel system | en |
dc.subject | error detection | en |
dc.subject | fault diagnosis | en |
dc.subject | backward error recovery | en |
dc.subject | reconfiguration | en |
dc.title | Fault Tolerance in Massively Parallel Systems | en |
dc.type | Article | en |
- Appears in Collections:
- Files in This Item:
- There are no files associated with this item.
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.