Publications
Detailed Information
Architecture-based and target-oriented algorithm optimization of high-order methods via complete-search tensor contraction
Cited 1 time in
Web of Science
Cited 5 time in Scopus
- Authors
- Issue Date
- 2021-07
- Publisher
- Elsevier BV
- Citation
- Computer Physics Communications, Vol.264, p. 107988
- Abstract
- Sophisticated solution algorithms, along with complex data structures, are known as the main barriers that hinder high-order methods from being actively embraced by industry and academia. Simultaneously, modern computing machines offer a wide variety of opportunities to enhance the performance of solution algorithms through highly tuned computational kernels. To address this issue, we present an architecture-based and target-oriented algorithm optimization for high-order methods, called completesearch tensor contraction (CsTC). The key idea of CsTC is to convert the tensor operations of a high-order method into an optimization problem, which leads to finding an optimized method to execute tensor contraction (TC). After introducing the general framework of CsTC, it was applied to the discontinuous Galerkin (DG) discretization. An approach based on general matrix multiplication (GEMM) is adopted because of its flexibility to handle the intermediate order of TC and the reusability of state-of-the-art GEMM primitives. By optimizing data structures as well as TC operations, CsTC provides an optimized solution algorithm that performs significantly better than the original non-optimized high-order method. The entire optimization process is automatically completed in a few minutes at a pre-processing step on a computer. The proposed CsTC optimization fully reflects the mesh and solution parameters adopted as well as the computing architecture used, thus, it is completely target-oriented and architecture-based. Various solution parameters and computing architectures are used and compared. All the results indicate that the optimization is essential to extract the best performance from a given computing architecture and that the performance enhancement becomes substantial as the DG approximation order increases and as a more recent processor is employed. Finally, a 3-D viscous flow problem governed by the compressible Navier-Stokes equations is solved. The optimized algorithm yields more than 10 x speedup compared to the algorithm with a nested-loop approach when DG-P3and DG-P5approximations are used. (C) 2021 Elsevier B.V. All rights reserved.
- ISSN
- 0010-4655
- Files in This Item:
- There are no files associated with this item.
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.