ConcatPlexer

한동훈

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

ConcatPlexer : 데이터 멀티플렉싱을 활용한 트랜스포머 기반 모델의 성능 제고
Additional Dim1 Batching for Faster ViTs

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 한동훈

Advisor: 곽노준

Issue Date: 2023

Publisher: 서울대학교 대학원

Keywords: Artificial Intelligence ; Deep Learning ; Throughput ; Vision Transformer ; Neural Network ; Efficient Modelling

Description: 학위논문(석사) -- 서울대학교대학원 : 공과대학 협동과정 인공지능전공, 2023. 8. 곽노준.

Abstract: 트랜스포머는 자연어 처리(NLP) 영역 뿐만 아니라 컴퓨터 비전 분야에서도 성공적인 모습을 보여주고있으며, 다양한 창의적인 접근과 응용을 일으키고 있습니다. 그러나 트랜스포머의 우수한 성능과 모델링 유연성은 연산 비용의 심한 증가를 동반하기 때문에, 최근 연구들에서는 이 부담을 줄이는 것이 주요 관심사 중 하나입니다. 언어 모델에 원래 제안된 비용 절감 방법인 데이터 멀티플렉싱(Data Multiplexing, DataMUX)에서 영감을 받아, 해당 연구는 효율적인 시각 인식을 위한 접근 방식을 제안합니다. 이 방식은 추가적인 1 째 차원의 배치(즉, 연결)를 사용하여 처리량을 크게 향상시키면서도 정확도를 희생을 최소화 하였습니다. 우리는 먼저 DataMUX 기술을 비전 모델에 단순 적용하 Image Multiplexer 를 소개하고, 이를 극복하기 위한 새로운 구성 요소를 고안하여 최종 모델인 ConcatPlexer 를 제안했습니다. ConcatPlexer 는 ImageNet1K 와 CIFAR100 데이터셋에서 학습되었으며, ViT-B/16 보다 23.5% 적은 GFLOP 을 달성하면서 각각 69.5%와 83.4%의 검증 정확도를 얻었습니다.
Transformers have demonstrated tremendous success not only in the natural language processing (NLP) domain but also the field of computer vision, igniting various creative approaches and applications. Yet, the superior performance and modeling flexibility of transformers came with a severe increase in computation costs, and hence several works have proposed methods to reduce this burden. Inspired by a cost-cutting method originally proposed for language models, Data Multiplexing (DataMUX), we propose a novel approach for efficient visual recognition that employs additional dim1 batching (i.e., concatenation) that greatly improves the throughput
with little compromise in the accuracy. We first introduce a naive adaptation of DataMux for vision models, Image Multiplexer, and devise novel components to overcome its weaknesses, rendering our final model, ConcatPlexer, at the sweet spot between inference speed and accuracy. The ConcatPlexer was trained on ImageNet1K and CIFAR100 dataset and it achieved 23.5% less GFLOPs than ViTB/16 with 69.5% and 83.4% validation accuracy, respectively.

Language: kor

URI: https://hdl.handle.net/10371/196564

https://dcollection.snu.ac.kr/common/orgView/000000177887

Files in This Item:

000000177887.pdf 0.84 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Program in Artificial Intelligence (협동과정-인공지능전공)
  - Theses (Master's Degree_협동과정-인공지능전공)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share