Differentiable Modular Learning For Multi-Task Learning

염준호

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Differentiable Modular Learning For Multi-Task Learning : 다중 작업 학습을 위한 미분 가능한 모듈 학습법

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 염준호

Advisor: 문병로

Issue Date: 2022

Publisher: 서울대학교 대학원

Keywords: Multi-TaskLearning ; ModularLearning ; NeuralArchitectureSearch

Description: 학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2022. 8. 문병로.

Abstract: 최근의 딥러닝의 발전이 컴퓨터 비젼, 자연어처리, 과학적 발견과 같은 복잡한 문제
를 풀 수 있게 하였지만, 이와 같은 고품질의 결과는 여전히 거대한 양의 계산량에
의존하고 있다. 신경망 구조의 가용성을 늘리기 위한 하나의 방법으로, 하나의
슈퍼네트워크에 다중 작업을 학습시켜 작업간 공유할 수 있는 정보를 활용하는
방법을 사용해볼 수 있다. 다중 작업 학습의 최근 연구에서 듬성한 공유 방법은
기존의 다중 학습의 공유 방법에 비해 파라미터 효율과 역전이효과의 감소 측면에
서 좋은 결과를 보여주었지만, 기존 듬성한 방법은 각 작업을 위한 모듈화 규칙을
찾는 계산 비용이 매우 비싸다는 한계가 있다. 이 논문에서는 미분 가능한 모듈화
학습을 도입해 각 작업을 위한 모듈화 규칙을 찾는 비용을 완화시키려고 한다. 이
방법은 연속 도메인 이완을 이용해 한 번의 학습으로 모든 작업에 대한 모듈화 규
칙을 찾아낸다. 우리는 이 방법을 CoNLL-2003 데이터셋의 세가지 문자열 라벨링
작업에 대해 예증하였고, 평균 정확도는 기존의 작업과 거의 비슷한 수준을 얻지만
모듈화 규칙은 더욱 빠르게 찾는 것을 보였다. 마지막으로 우리는 모듈화 규칙의
밀집 정도에 따른 성능 저하와 모듈화 규칙간의 겹치는 비율을 분석하였다.
While recent advances in deep learning enable to solve complex problems such
as computer vision, natural language processing, and scientific discovery, they
still depend on the huge amount of computations for high quality results. As a
way of enlarging the capacity of a neural architecture, one can train a single su-
pernet for multiple tasks to exploit the shareable knowledge among the tasks.
In the recent works on multi-task learning, sparse sharing has drawn atten-
tion for its parameter efficiency with less negative transfer effect compared to
the traditional sharing strategies. However, existing methods of sparse sharing
have limitations in that finding a modularization scheme for the specific task
is extremely computational heavy. In this work, we alleviate the cost of search-
ing the modularization schemes by introducing differentiable modular learning
which finds the schemes for all tasks in one-shot through continuous relaxation.
We demonstated on three sequence labeling tasks on CoNLL-2003 dataset and
show our approach achieves the average accuracy on par with the accuracy
of the previous work whereas finding the modularization schemes much faster.
Lastly, we analyze the performance degradation according to the sparsity level
of the schemes and study the overlapping ratio between schemes.

Language: eng

URI: https://hdl.handle.net/10371/187773

https://dcollection.snu.ac.kr/common/orgView/000000173632

Files in This Item:

000000173632.pdf 2.44 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share