A Black-Box Graph Partitioner for Generalized Deep Neural Networks Parallelization

MATEU Jaume

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

A Black-Box Graph Partitioner for Generalized Deep Neural Networks Parallelization : 심층 신경망 병렬화를 위한 블랙박스 그래프 분할기

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: MATEU Jaume

Advisor: Bernhard Egger

Issue Date: 2023

Publisher: 서울대학교 대학원

Keywords: Deep Neural Networks ; Parallelization ; Compiler ; Automatic Searcher ; Black-Box ; Graph Optimizer

Description: 학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2023. 2. Bernhard Egger.

Abstract: Deep neural networks (DNNs) compromising larger and larger models are being adopted in many domains.
Businesses and individuals looking to create deep learning applications require purchasing expensive hardware setups or renting high-end machines from cloud providers, both of which are significant to the customers.
An exciting alternative to avoid the high cost of machines powerful enough to run DNNs with billions or trillions of parameters is to use a cheaper but slower set of machines and distribute the workload.
Several parallelization strategies have been proposed to tailor the storage and computational requirements to each available device while meeting the application's latency requirements.
However, using such setups requires customers to have intricate knowledge of the algorithm or model to devise an efficient plan of workload parallelization.
In this thesis, I propose BBGraP, a black-box graph partitioner that is device- and model-agnostic and produces efficient parallelization plans for deep learning inference.
The proposed method takes different types of networks as input and generates a workload division that satisfies each node's memory and computational constraints.
A graph optimizer eliminates redundant operations, data transfers, and synchronization points to reduce the amount of data transferred while improving a workload's latency.
Then an automatic search finds the best partition possible according to the configuration given.
As a proof-of-concept, I apply BBGraP to a cluster of distributed nodes and a multicore FPGA. The evaluation shows a speedup up to 2-fold.
최근에 다양한 도메인에서 개발 및 사용하는 심층 신경망들의 크기는 점점 더 커지는 추세이다.
이로 인해 딥 러닝 애플리케이션을 개발하려는 기업과 개인은 고가의 하드웨어 설비를 구입하거나 클라우드 공급자로부터 하이엔드 머신을 임대해야 하며, 이는 사용자들에게 큰 부담으로 다가온다.
수십 억 또는 수조 개의 매개 변수를 가지는 심층 신경망을 실행할 수 있는 노드들을 사용할 때 발생하는 높은 비용을 피하는 대안 중 하나는 더 저렴하지만 느린 노드들을 동시에 사용해 작업량을 분산하는 것이다.
작업량 분산과 애플리케이션의 지연 시간 요구 사항들을 동시에 충족하면서 각 노드의 메모리 및 계산 요구 사항에 따라 작업량을 조정하기 위해 여러 병렬화 전략들이 제안됐지만, 이러한 방법들을 사용자가 직접 사용하기 위해선 분산화 전략과 심층 신경망에 대한 상당한 지식을 보유해야만 한다.
이러한 문제점을 해결하고자 본 논문에서는 하드웨어 설비 및 심층 신경망 모델에 구애받지 않고 손쉽게 딥 러닝 추론을 위한 효율적인 병렬화 계획을 생성하는 블랙박스 그래프 분할기인 BBGraP를 제안한다.
BBGraP을 통해 주어진 각 노드의 메모리 및 계산 제약 조건에 따라 효율적인 워크로드 분할을 생성하며, 사용자들이 원하는 다양한 유형의 심층 신경망들을 실행할 수 있다.
BBGraP에서 분할 방식을 고안할 때 사용하는 그래프 최적화 도구는 중복 작업, 데이터 전송 및 동기화 지점을 제거하여 전송되는 데이터 양을 줄여 워크로드의 지연 시간을 개선하고, 그 후에 자동 검색 방식이 지정된 설정에 따라 가능한 최적의 파티션을 찾게 된다.
이러한 방식들을 통해 여러 노드를 포함한 클러스터와 다중 코어 FPGA에서 최고 2배의 성능 향상을 보여주는걸 확인할 수 있었다.

Language: eng

URI: https://hdl.handle.net/10371/193324

https://dcollection.snu.ac.kr/common/orgView/000000174194

Files in This Item:

000000174194.pdf 4.70 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share