Publications

Detailed Information

mGEMM: low-latency convolution with minimal memory overhead optimized for mobile devices

DC Field Value Language
dc.contributor.authorPark, Jongseok-
dc.contributor.authorBin, Kyungmin-
dc.contributor.authorLee, Kyunghan-
dc.date.accessioned2022-10-05T04:10:17Z-
dc.date.available2022-10-05T04:10:17Z-
dc.date.created2022-08-26-
dc.date.created2022-08-26-
dc.date.created2022-08-26-
dc.date.created2022-08-26-
dc.date.created2022-08-26-
dc.date.issued2022-06-
dc.identifier.citationProceedings of ACM MobiSys, pp.222-234-
dc.identifier.urihttps://hdl.handle.net/10371/185317-
dc.description.abstract© 2022 ACM.The convolution layer is the key building block in many neural network designs. Most high-performance implementations of the convolution operation rely on GEMM (General Matrix Multiplication) to achieve high computational throughput with a large workload size. However, in mobile environments, the user experience priority puts focus on low-latency inferences over a single or limited batch size. This signifies two major problems of current GEMM-based solutions: 1) GEMM-based solutions require mapping the convolution operation to GEMM, causing overheads in both computation and memory, 2) GEMM-based solutions lose large opportunities of data reuse while mapping, leading to under-utilization of the given hardware. Through an in-depth analysis of current GEMM-based solutions, we identify the root cause of these problems, and we propose mGEMM, a convolution solution that overcomes the aforementioned problems, without changes in accuracy. mGEMM expands the structure of GEMM in such a way that it can accommodate the convolution operation without any overhead, while the existing algorithms suffer from inefficiencies in converting the convolution operation to a static GEMM algorithm. Our extensive evaluations done over various neural networks and test devices show that mGEMM outperforms the existing solutions in the aspects of latency, memory overhead, and energy consumption. In running a real-world application, YoloV3-Tiny object detection, mGEMM achieves up to 1.29× and 1.58× speedup in total latency and convolution latency compared to the state-of-the-art, resulting in 15.5% reduction in energy consumption while using only near-minimum heap memory.-
dc.language영어-
dc.publisherACM-
dc.titlemGEMM: low-latency convolution with minimal memory overhead optimized for mobile devices-
dc.typeArticle-
dc.identifier.doi10.1145/3498361.3538940-
dc.citation.journaltitleProceedings of ACM MobiSys-
dc.identifier.scopusid2-s2.0-85134007045-
dc.citation.endpage234-
dc.citation.startpage222-
dc.description.isOpenAccessN-
dc.contributor.affiliatedAuthorLee, Kyunghan-
dc.type.docTypeConference Paper-
dc.description.journalClass1-
dc.subject.keywordAuthorconvolutional neural networks-
dc.subject.keywordAuthorparallel computing algorithms-
Appears in Collections:
Files in This Item:
There are no files associated with this item.

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share