멀티쓰레드 객체 탐지 응용의  종단 간 지연시간 단축

김지후

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

멀티쓰레드 객체 탐지 응용의 종단 간 지연시간 단축 : Reducing End-to-end Latency in a Multithreaded Object Detection Application

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 김지후

Advisor: 홍성수

Issue Date: 2022

Publisher: 서울대학교 대학원

Keywords: Object detection ; Multithreaded application ; End-to-end latency ; Darknet-YOLO

Description: 학위논문(석사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022.2. 홍성수.

Abstract: 객체 탐지는 이미지를 입력으로 받아 객체들의 분류(Classification)와 위치 탐지(Localization)를 수행하는 컴퓨터 비전 기술이다. 최근에 객체 탐지에 많이 사용되고 있는 딥러닝은 높은 계산량을 요구한다는 것이 알려져 있다. 따라서 객체 탐지를 가속하려는 연구들은 대부분 딥러닝 모델을 가볍게 최적화하여 딥러닝 추론(Inference) 시간을 단축시키려 한다. 이런 연구들 중 YOLO(You Look Only Once)는 적당한 정확도와 빠른 추론 속도로 객체 탐지를 위한 대표적인 딥러닝 모델 중 하나로 자리잡았다. 그러나 객체 탐지 응용의 종단 간 지연 시간(end-to-end latency) 관점에서 봤을 때, 추론뿐만 아니라 image fetch, 전처리, 후처리, 화면 출력 등 다른 여러 과정들도 상당한 실행 시간을 차지한다. 또한 멀티쓰레드 객체 탐지 응용에서는 객체 탐지의 각 과정에 소모되는 시간이 중첩되어 종단 간 지연 시간을 단축시키는 것이 어렵다. 본 논문에서는 YOLO를 기반으로 한 멀티쓰레드 객체 탐지 응용인 Darknet-YOLO의 종단 간 지연 시간을 단축시키기 위한 3가지 기법을 소개한다. 우리는 응용에서 나타나는 blocking time을 분석하고, 이를 제거할 수 있는 해결책을 구현한다. 또한 각 쓰레드의 수행시간을 예측하고 쓰레드의 시작 시간을 동적으로 조절한다. 마지막으로 상당한 계산량을 필요로 하는 preprocessing 함수들을 GPU가 idle할 때 GPU에서 수행하도록 한다. 본 기법은 Nvidia Xavier AGX Jetson에서 평가되었다. Nsight System과 Nvidia Tools Extension(NVTX)를 사용하여 프로파일링 한 결과, Darknet-YOLO의 종단 간 지연 시간은 58.18% 단축되었다.
Object detection is a computer vision technology that receives images as input and performs object classification and localization. Deep learning, which has been widely used for object detection recently, is known to require high computational demand. Therefore, most existing studies accelerating object detection have tried to optimize the deep learning model to require less computational demand, in order to reduce the time consumed in deep learning inference. Among these studies, YOLO (You Look Only Once) has been established as one of the representative deep learning models for object detection due to its fast inference speed with modest accuracy. However, not only inference but also other processes such as image fetch, pre-processing, post-processing, and display occupy significant time in the end-to-end latency of an object detection application. Also, in a multithreaded object detection application, the time spent for each process can be overlapped and makes it challenging to reduce the end-to-end latency. In this paper, we present 3 techniques to reduce the end-to-end latency of a Darknet-YOLO which is a multithreaded object detection application based on YOLO. First, we thoroughly analyze the cause of blocking time and eliminate it. Second, we predict the execution time of each thread and dynamically control the starting time. Lastly, we run the pre-processing functions requiring high computational demand on the GPU when the GPU is idle. Proposed techniques are evaluated on Nvidia Xavier AGX Jetson. As a result of profiling with Nsight System and Nvidia Tools Extension(NVTX), the end-to-end latency of Darknet-YOLO is reduced by 58.18%.

Language: kor

URI: https://hdl.handle.net/10371/183164

https://dcollection.snu.ac.kr/common/orgView/000000170074

Files in This Item:

000000170074.pdf 1.45 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Master's Degree_전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share