Improving Rule-based Out-of-distribution Generalization Abilities of Sequence Generation Models

김세광

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Improving Rule-based Out-of-distribution Generalization Abilities of Sequence Generation Models : 시퀀스생성모델의 규칙기반 분포외일반화능력 향상방법

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 김세광

Advisor: 정교민

Issue Date: 2022

Publisher: 서울대학교 대학원

Keywords: DeepLearning ; SequenceGeneration ; Out-of-distributionGeneralization

Description: 학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022. 8. 정교민.

Abstract: Developing human-level machines that can learn and extend rules is a long-standing challenge for the artificial intelligence community. Even though current deep learning models have proven remarkable performances over a wide range of applications, the models still struggle to apply learned rules to novel inputs that do not follow the training distribution. Such a lack of deep models rule-based out-of-distribution generalization, i.e., systematic generalization, abilities limits many deep learning applications, especially about sequence generation tasks requiring logical reasoning, such as semantic parsing, or suffering from data scarcity, such as low-resource machine translation.
Therefore, this dissertation aims to measure and improve the systematic generalization abilities of deep learning sequence generation models. To measure the abilities of deep models, we propose number sequence prediction problems. We estimate deep learning models computational powers by testing the models on our problems and comparing the models with Automata that can solve the problems. Then, to improve the systematic generalization abilities of deep models, we propose three frameworks. The first framework is devising a new input preprocessing module, called neural sequence-to-grid module. The module can learn how to segment and align sequence inputs into the grid inputs—more advantageous forms for learning and applying symbolic rules. We empirically show that a deep learning model taking the grid inputs can extend learned rules on symbolic reasoning tasks, including program code evaluations or bAbI tasks. The second framework is to train neural networks with structurally hinted examples. We make such examples by annotating the training targets with delimiter tokens representing the non-terminal nodes of the targets parsing trees. We show the efficacy of our annotated targets, experimenting with instruction following tasks requiring compositional reasoning, and achieving substantial performance gains. The last framework is to reformulate sequence generation tasks into classification-and-generation tasks using template retrieving and re-ranking with neural models. The templates, high-level sketches of target sequences, relieve the models burdens of hard structural modeling and let the model focus on easy template realization. Experimental results show that our selected templates lead to substantial performance gains of deep learning models on four different semantic parsing tasks.
규칙을 학습하고 확장할 수 있는 인간 수준의 기계를 개발하는 것은 인공지능 커뮤니티의 오랜 과제이다. 현재의 딥 러닝 모델은 광범위한 응용 분야에서 놀라운 성능을 입증했지만, 모델은 여전히 학습분포를 따르지 않는 참신한 예제에 대해 학습된 규칙을 적용하는데 어려움을 겪고 있다. 이러한 규칙 기반 분포외일반화, 즉 체계적 일반화를 하지 못하는 딥러닝 모델은 응용이 제한되는데, 특히 시멘틱 파싱과 같은 논리적 추론이 필요하거나 저자원 기계 번역과 같은 데이터 부족에 시달리는 시퀀스 처리작업에 응용될 수 없다.
따라서 본 논문은 딥 러닝 텍스트 생성 모델의 체계적 일반화 능력을 측정하고 개선하는 것을 목표로 한다. 논문의 첫 번째 부분은 현재 딥 러닝 모델의 체계적 일반화 능력을 평가하는 것이다. 특히, 우리는 숫자 시퀀스 예측 문제를 설계하고 동등하게 표현되는 오토마타를 이용해 모델의 계산 능력을 측정한다. 논문의 나머지 부분은 딥 러닝 모델의 체계적 일반화를 달성하기 위한 다양한 프레임워크를 제안한다. 첫 번째 프레임워크는 신경 시퀀스 투 그리드 모듈이라는 새로운 입력 전처리 모듈을 고안하는 것이다. 모듈은 시퀀스 입력을 그리드 입력으로 분할하고 정렬하는 방법을 배울 수 있다. 즉, 기호 규칙을 학습하고 적용하는 데 더 유리한 형태이다. 우리는 그리드 입력을 취하는 딥 러닝 모델이 프로그램 코드 평가 또는 bAbI 작업을 포함하여 상징적 추론 작업에 대해 학습된 규칙을 확장할 수 있음을 실험적으로 보여주었다. 두 번째 프레임워크는 구조적으로 암시된 예시로 신경망을 훈련시키는 것이다. 우리는 타겟의 구문 분석 트리의 비말단 노드를 나타내는 구분자 토큰으로 타겟에 주석을 단다. 우리는 구성 추론이 필요한 작업에 실험하여 상당한 성능 향상을 달성함으로써 타겟에 주석을 다는 방법의 효과성을 보여주었다. 마지막 프레임워크는 선택된 템플릿을 신경 생성 모델에 제공하는 것이다. 대상 시퀀스의 높은 수준의 스케치인 템플릿은 모델이 어려운 구조 모델링을 해야하는 부담을 완화하고 모델이 쉬운 템플릿 실현에 집중할 수 있도록 한다. 저렴하고 큰 템플릿 풀에서 신경 모델로 검색하여 순위를 다시 매겨 템플릿을 선택한다. 실험 결과는 우리가 선택한 템플릿이 네 가지 다른 의미 시멘틱파싱에서 딥 러닝 모델의 성능을 크게 향상시킨다는 것을 보여준다.

Language: eng

URI: https://hdl.handle.net/10371/187731

https://dcollection.snu.ac.kr/common/orgView/000000172085

Files in This Item:

000000172085.pdf 5.09 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share