Publications

Detailed Information

FlashNeuron: SSD-Enabled large-Batch training of very deep neural networks

DC Field Value Language
dc.contributor.authorBae, Jonghyun-
dc.contributor.authorLee, Jongsung-
dc.contributor.authorJin, Yunho-
dc.contributor.authorSon, Sam-
dc.contributor.authorKim, Shine-
dc.contributor.authorJang, Hakbeom-
dc.contributor.authorHam, Tae Jun-
dc.contributor.authorLee, Jae Wook-
dc.date.accessioned2022-06-24T00:26:17Z-
dc.date.available2022-06-24T00:26:17Z-
dc.date.created2022-05-04-
dc.date.issued2021-02-
dc.identifier.citationProceedings of the 19th USENIX Conference on File and Storage Technologies, FAST 2021, pp.387-401-
dc.identifier.urihttps://hdl.handle.net/10371/183763-
dc.description.abstract© 2021 by The USENIX Association.Deep neural networks (DNNs) are widely used in various AI application domains such as computer vision, natural language processing, autonomous driving, and bioinformatics. As DNNs continue to get wider and deeper to improve accuracy, the limited DRAM capacity of a training platform like GPU often becomes the limiting factor on the size of DNNs and batch size—called memory capacity wall. Since increasing the batch size is a popular technique to improve hardware utilization, this can yield a suboptimal training throughput. Recent proposals address this problem by offloading some of the intermediate data (e.g., feature maps) to the host memory. However, they fail to provide robust performance as the training process on a GPU contends with applications running on a CPU for memory bandwidth and capacity. Thus, we propose FlashNeuron, the first DNN training system using an NVMe SSD as a backing store. To fully utilize the limited SSD write bandwidth, FlashNeuron introduces an offloading scheduler, which selectively offloads a set of intermediate data to the SSD in a compressed format without increasing DNN evaluation time. FlashNeuron causes minimal interference to CPU processes as the GPU and the SSD directly communicate for data transfers. Our evaluation of FlashNeuron with four state-of-the-art DNNs shows that FlashNeuron can increase the batch size by a factor of 12.4× to 14.0× over the maximum allowable batch size on NVIDIA Tesla V100 GPU with 16GB DRAM. By employing a larger batch size, FlashNeuron also improves the training throughput by up to 37.8% (with an average of 30.3%) over the baseline using GPU memory only, while minimally disturbing applications running on CPU.-
dc.language영어-
dc.publisherUSENIX Association-
dc.titleFlashNeuron: SSD-Enabled large-Batch training of very deep neural networks-
dc.typeArticle-
dc.citation.journaltitleProceedings of the 19th USENIX Conference on File and Storage Technologies, FAST 2021-
dc.identifier.wosid000668976100026-
dc.identifier.scopusid2-s2.0-85102979184-
dc.citation.endpage401-
dc.citation.startpage387-
dc.description.isOpenAccessN-
dc.contributor.affiliatedAuthorLee, Jae Wook-
dc.type.docTypeConference Paper-
dc.description.journalClass1-
Appears in Collections:
Files in This Item:
There are no files associated with this item.

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share