Publications

Detailed Information

Architecture-based and target-oriented algorithm optimization of high-order methods via complete-search tensor contraction

Cited 1 time in Web of Science Cited 5 time in Scopus
Authors

You, Hojun; Kim, Chongam

Issue Date
2021-07
Publisher
Elsevier BV
Citation
Computer Physics Communications, Vol.264, p. 107988
Abstract
Sophisticated solution algorithms, along with complex data structures, are known as the main barriers that hinder high-order methods from being actively embraced by industry and academia. Simultaneously, modern computing machines offer a wide variety of opportunities to enhance the performance of solution algorithms through highly tuned computational kernels. To address this issue, we present an architecture-based and target-oriented algorithm optimization for high-order methods, called completesearch tensor contraction (CsTC). The key idea of CsTC is to convert the tensor operations of a high-order method into an optimization problem, which leads to finding an optimized method to execute tensor contraction (TC). After introducing the general framework of CsTC, it was applied to the discontinuous Galerkin (DG) discretization. An approach based on general matrix multiplication (GEMM) is adopted because of its flexibility to handle the intermediate order of TC and the reusability of state-of-the-art GEMM primitives. By optimizing data structures as well as TC operations, CsTC provides an optimized solution algorithm that performs significantly better than the original non-optimized high-order method. The entire optimization process is automatically completed in a few minutes at a pre-processing step on a computer. The proposed CsTC optimization fully reflects the mesh and solution parameters adopted as well as the computing architecture used, thus, it is completely target-oriented and architecture-based. Various solution parameters and computing architectures are used and compared. All the results indicate that the optimization is essential to extract the best performance from a given computing architecture and that the performance enhancement becomes substantial as the DG approximation order increases and as a more recent processor is employed. Finally, a 3-D viscous flow problem governed by the compressible Navier-Stokes equations is solved. The optimized algorithm yields more than 10 x speedup compared to the algorithm with a nested-loop approach when DG-P3and DG-P5approximations are used. (C) 2021 Elsevier B.V. All rights reserved.
ISSN
0010-4655
URI
https://hdl.handle.net/10371/194661
DOI
https://doi.org/10.1016/j.cpc.2021.107988
Files in This Item:
There are no files associated with this item.
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share