Efficient Resource Scaling Policy in Inference Serving of Natural Language Generation Models

조성우

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Efficient Resource Scaling Policy in Inference Serving of Natural Language Generation Models : 자연어 생성 모델 추론 서비스의 효율적인 자원 스케일링 정책

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 조성우

Advisor: 전병곤

Issue Date: 2022

Publisher: 서울대학교 대학원

Keywords: DeepLearning ; Serving ; NLP ; NLG ; GPU ; ResourceManagement ; Scaling ; InferenceEngine

Description: 학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2022. 8. 전병곤.

Abstract: Though number of different types of Deep Neural Network (DNN) models are increasing, language generation model is still the most in demand. There is also an increasing demand for serving the pre-trained model. However, managing computing resources in serving Natural Language Generation (NLG) model is not a trivial problem, because requests and responses of each query is different due to a variety of environment. Moreover, it is even more challenging to decide scaling policy, which minimizes both violation of service level objective (SLO) and GPU resource usage. In this paper, we discuss the problem of using efficient GPU resources in serving language generation model, and propose a design a serving framework which supports fast and accurate scaling policy. We implemented an deep learning inference serving framework with policy and validated our system on the serving request query workloads.
다양한 유형의 심층 신경망 모델 (DNN)이 증가함에 따라 자연어 생성 모델에 대한 관심이 많아지고 있다. 또한 학습된 모델 이용한 추론 서비스에 대한 수요 또한 함께 증가하고 있다. 그러나 자연어 생성 모델 추론 서비스를 운용하는 데 있어서 컴퓨팅 자원을 효율적으로 사용하는 것은 단순한 문제가 아니다. 이는 추론 서비스에 들어오는 각 쿼리마다 추론 엔진에서 사용하는 컴퓨팅 자원이 다르기 때문이다. 그렇기에 추론 서비스에 대해 자원 스케일링 정책을 사용하는 것은 훨씬 더 어려운 일이다. 본 논문에서는 언어 생성 모델 추론 서비스에서 GPU 자원을 효율적으로 사용하는 문제에 대해 논의한다. 문제를 해결하기 위한 빠르고 정확한 자원 스케일링 정책을 제안하고, 요청 쿼리 워크로드에 대해서 해당 정책을 검증한다.

Language: eng

URI: https://hdl.handle.net/10371/187774

https://dcollection.snu.ac.kr/common/orgView/000000173181

Files in This Item:

000000173181.pdf 2.49 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share