Publications

Detailed Information

FlashNeuron: SSD-Enabled large-Batch training of very deep neural networks

Cited 26 time in Web of Science Cited 30 time in Scopus
Authors

Bae, Jonghyun; Lee, Jongsung; Jin, Yunho; Son, Sam; Kim, Shine; Jang, Hakbeom; Ham, Tae Jun; Lee, Jae Wook

Issue Date
2021-02
Publisher
USENIX Association
Citation
Proceedings of the 19th USENIX Conference on File and Storage Technologies, FAST 2021, pp.387-401
Abstract
© 2021 by The USENIX Association.Deep neural networks (DNNs) are widely used in various AI application domains such as computer vision, natural language processing, autonomous driving, and bioinformatics. As DNNs continue to get wider and deeper to improve accuracy, the limited DRAM capacity of a training platform like GPU often becomes the limiting factor on the size of DNNs and batch size—called memory capacity wall. Since increasing the batch size is a popular technique to improve hardware utilization, this can yield a suboptimal training throughput. Recent proposals address this problem by offloading some of the intermediate data (e.g., feature maps) to the host memory. However, they fail to provide robust performance as the training process on a GPU contends with applications running on a CPU for memory bandwidth and capacity. Thus, we propose FlashNeuron, the first DNN training system using an NVMe SSD as a backing store. To fully utilize the limited SSD write bandwidth, FlashNeuron introduces an offloading scheduler, which selectively offloads a set of intermediate data to the SSD in a compressed format without increasing DNN evaluation time. FlashNeuron causes minimal interference to CPU processes as the GPU and the SSD directly communicate for data transfers. Our evaluation of FlashNeuron with four state-of-the-art DNNs shows that FlashNeuron can increase the batch size by a factor of 12.4× to 14.0× over the maximum allowable batch size on NVIDIA Tesla V100 GPU with 16GB DRAM. By employing a larger batch size, FlashNeuron also improves the training throughput by up to 37.8% (with an average of 30.3%) over the baseline using GPU memory only, while minimally disturbing applications running on CPU.
URI
https://hdl.handle.net/10371/183763
Files in This Item:
There are no files associated with this item.
Appears in Collections:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share