Publications
Detailed Information
mGEMM: low-latency convolution with minimal memory overhead optimized for mobile devices
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Park, Jongseok | - |
dc.contributor.author | Bin, Kyungmin | - |
dc.contributor.author | Lee, Kyunghan | - |
dc.date.accessioned | 2022-10-05T04:10:17Z | - |
dc.date.available | 2022-10-05T04:10:17Z | - |
dc.date.created | 2022-08-26 | - |
dc.date.created | 2022-08-26 | - |
dc.date.created | 2022-08-26 | - |
dc.date.created | 2022-08-26 | - |
dc.date.created | 2022-08-26 | - |
dc.date.issued | 2022-06 | - |
dc.identifier.citation | Proceedings of ACM MobiSys, pp.222-234 | - |
dc.identifier.uri | https://hdl.handle.net/10371/185317 | - |
dc.description.abstract | © 2022 ACM.The convolution layer is the key building block in many neural network designs. Most high-performance implementations of the convolution operation rely on GEMM (General Matrix Multiplication) to achieve high computational throughput with a large workload size. However, in mobile environments, the user experience priority puts focus on low-latency inferences over a single or limited batch size. This signifies two major problems of current GEMM-based solutions: 1) GEMM-based solutions require mapping the convolution operation to GEMM, causing overheads in both computation and memory, 2) GEMM-based solutions lose large opportunities of data reuse while mapping, leading to under-utilization of the given hardware. Through an in-depth analysis of current GEMM-based solutions, we identify the root cause of these problems, and we propose mGEMM, a convolution solution that overcomes the aforementioned problems, without changes in accuracy. mGEMM expands the structure of GEMM in such a way that it can accommodate the convolution operation without any overhead, while the existing algorithms suffer from inefficiencies in converting the convolution operation to a static GEMM algorithm. Our extensive evaluations done over various neural networks and test devices show that mGEMM outperforms the existing solutions in the aspects of latency, memory overhead, and energy consumption. In running a real-world application, YoloV3-Tiny object detection, mGEMM achieves up to 1.29× and 1.58× speedup in total latency and convolution latency compared to the state-of-the-art, resulting in 15.5% reduction in energy consumption while using only near-minimum heap memory. | - |
dc.language | 영어 | - |
dc.publisher | ACM | - |
dc.title | mGEMM: low-latency convolution with minimal memory overhead optimized for mobile devices | - |
dc.type | Article | - |
dc.identifier.doi | 10.1145/3498361.3538940 | - |
dc.citation.journaltitle | Proceedings of ACM MobiSys | - |
dc.identifier.scopusid | 2-s2.0-85134007045 | - |
dc.citation.endpage | 234 | - |
dc.citation.startpage | 222 | - |
dc.description.isOpenAccess | N | - |
dc.contributor.affiliatedAuthor | Lee, Kyunghan | - |
dc.type.docType | Conference Paper | - |
dc.description.journalClass | 1 | - |
dc.subject.keywordAuthor | convolutional neural networks | - |
dc.subject.keywordAuthor | parallel computing algorithms | - |
- Appears in Collections:
- Files in This Item:
- There are no files associated with this item.
Item View & Download Count
Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.