FlashNeuron: SSD-Enabled large-Batch training of very deep neural networks

Bae, Jonghyun; Lee, Jongsung; Jin, Yunho; Son, Sam; Kim, Shine; Jang, Hakbeom; Ham, Tae Jun; Lee, Jae Wook

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

FlashNeuron: SSD-Enabled large-Batch training of very deep neural networks

DC Field	Value	Language
dc.contributor.author	Bae, Jonghyun	-
dc.contributor.author	Lee, Jongsung	-
dc.contributor.author	Jin, Yunho	-
dc.contributor.author	Son, Sam	-
dc.contributor.author	Kim, Shine	-
dc.contributor.author	Jang, Hakbeom	-
dc.contributor.author	Ham, Tae Jun	-
dc.contributor.author	Lee, Jae Wook	-
dc.date.accessioned	2022-06-24T00:26:17Z	-
dc.date.available	2022-06-24T00:26:17Z	-
dc.date.created	2022-05-04	-
dc.date.issued	2021-02	-
dc.identifier.citation	Proceedings of the 19th USENIX Conference on File and Storage Technologies, FAST 2021, pp.387-401	-
dc.identifier.uri	https://hdl.handle.net/10371/183763	-
dc.description.abstract	© 2021 by The USENIX Association.Deep neural networks (DNNs) are widely used in various AI application domains such as computer vision, natural language processing, autonomous driving, and bioinformatics. As DNNs continue to get wider and deeper to improve accuracy, the limited DRAM capacity of a training platform like GPU often becomes the limiting factor on the size of DNNs and batch size—called memory capacity wall. Since increasing the batch size is a popular technique to improve hardware utilization, this can yield a suboptimal training throughput. Recent proposals address this problem by offloading some of the intermediate data (e.g., feature maps) to the host memory. However, they fail to provide robust performance as the training process on a GPU contends with applications running on a CPU for memory bandwidth and capacity. Thus, we propose FlashNeuron, the first DNN training system using an NVMe SSD as a backing store. To fully utilize the limited SSD write bandwidth, FlashNeuron introduces an offloading scheduler, which selectively offloads a set of intermediate data to the SSD in a compressed format without increasing DNN evaluation time. FlashNeuron causes minimal interference to CPU processes as the GPU and the SSD directly communicate for data transfers. Our evaluation of FlashNeuron with four state-of-the-art DNNs shows that FlashNeuron can increase the batch size by a factor of 12.4× to 14.0× over the maximum allowable batch size on NVIDIA Tesla V100 GPU with 16GB DRAM. By employing a larger batch size, FlashNeuron also improves the training throughput by up to 37.8% (with an average of 30.3%) over the baseline using GPU memory only, while minimally disturbing applications running on CPU.	-
dc.language	영어	-
dc.publisher	USENIX Association	-
dc.title	FlashNeuron: SSD-Enabled large-Batch training of very deep neural networks	-
dc.type	Article	-
dc.citation.journaltitle	Proceedings of the 19th USENIX Conference on File and Storage Technologies, FAST 2021	-
dc.identifier.wosid	000668976100026	-
dc.identifier.scopusid	2-s2.0-85102979184	-
dc.citation.endpage	401	-
dc.citation.startpage	387	-
dc.description.isOpenAccess	N	-
dc.contributor.affiliatedAuthor	Lee, Jae Wook	-
dc.type.docType	Conference Paper	-
dc.description.journalClass	1	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Journal Papers (저널논문_컴퓨터공학부)

Files in This Item:: There are no files associated with this item.

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share