Architecture-based and target-oriented algorithm optimization of high-order methods via complete-search tensor contraction

You, Hojun; Kim, Chongam

doi:10.1016/j.cpc.2021.107988

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Architecture-based and target-oriented algorithm optimization of high-order methods via complete-search tensor contraction

Cited 1 time in Web of Science Cited 5 time in Scopus

Export

Authors: You, Hojun; Kim, Chongam

Issue Date: 2021-07

Publisher: Elsevier BV

Citation: Computer Physics Communications, Vol.264, p. 107988

Abstract: Sophisticated solution algorithms, along with complex data structures, are known as the main barriers that hinder high-order methods from being actively embraced by industry and academia. Simultaneously, modern computing machines offer a wide variety of opportunities to enhance the performance of solution algorithms through highly tuned computational kernels. To address this issue, we present an architecture-based and target-oriented algorithm optimization for high-order methods, called completesearch tensor contraction (CsTC). The key idea of CsTC is to convert the tensor operations of a high-order method into an optimization problem, which leads to finding an optimized method to execute tensor contraction (TC). After introducing the general framework of CsTC, it was applied to the discontinuous Galerkin (DG) discretization. An approach based on general matrix multiplication (GEMM) is adopted because of its flexibility to handle the intermediate order of TC and the reusability of state-of-the-art GEMM primitives. By optimizing data structures as well as TC operations, CsTC provides an optimized solution algorithm that performs significantly better than the original non-optimized high-order method. The entire optimization process is automatically completed in a few minutes at a pre-processing step on a computer. The proposed CsTC optimization fully reflects the mesh and solution parameters adopted as well as the computing architecture used, thus, it is completely target-oriented and architecture-based. Various solution parameters and computing architectures are used and compared. All the results indicate that the optimization is essential to extract the best performance from a given computing architecture and that the performance enhancement becomes substantial as the DG approximation order increases and as a more recent processor is employed. Finally, a 3-D viscous flow problem governed by the compressible Navier-Stokes equations is solved. The optimized algorithm yields more than 10 x speedup compared to the algorithm with a nested-loop approach when DG-P3and DG-P5approximations are used. (C) 2021 Elsevier B.V. All rights reserved.

ISSN: 0010-4655

URI: https://hdl.handle.net/10371/194661

DOI: https://doi.org/10.1016/j.cpc.2021.107988

Files in This Item:: There are no files associated with this item.

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Mechanical Aerospace Engineering (기계항공공학부)
  - Journal Papers (저널논문_기계항공공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share