An Attention-Based Speaker Naming Method for Online Adaptation in Non-Fixed Scenarios

표정우

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

An Attention-Based Speaker Naming Method for Online Adaptation in Non-Fixed Scenarios : 고정되지 않은 시나리오에서의 온라인 적응을 위한 어텐션 기반 화자 명명 기법

DC Field	Value	Language
dc.contributor.advisor	차상균	-
dc.contributor.author	표정우	-
dc.date.accessioned	2020-05-07T03:41:31Z	-
dc.date.available	2020-05-07T03:41:31Z	-
dc.date.issued	2020	-
dc.identifier.other	000000158761	-
dc.identifier.uri	http://dcollection.snu.ac.kr/common/orgView/000000158761	ko_KR
dc.description	학위논문(석사)--서울대학교 대학원 :공과대학 전기·정보공학부,2020. 2. 차상균.	-
dc.description.abstract	A speaker naming task, which finds and identifies the active speaker in a certain movie or drama scene, is crucial for dealing with high-level video analysis applications such as automatic subtitle labeling and video summarization. Modern approaches have usually exploited biometric features with a gradient-based method instead of rule-based algorithms. In a certain situation, however, a naive gradient-based method does not work efficiently. For example, when new characters are added to the target identification list, the neural network needs to be frequently retrained to identify new people and it causes delays in model preparation. In this paper, we present an attention-based method which reduces the model setup time by updating the newly added data via online adaptation without a gradient update process. We comparatively analyzed with three evaluation metrics(accuracy, memory usage, setup time) of the attention-based method and existing gradient-based methods under various controlled settings of speaker naming. Also, we applied existing speaker naming models and the attention-based model to real video to prove that our approach shows comparable accuracy to the existing state-of-the-art models and even higher accuracy in some cases.	-
dc.description.abstract	특정 영화 또는 드라마 장면에서 활성 화자를 찾고 식별하는 화자 명명 작업은 자동 자막 라벨링 및 비디오 요약과 같은 고급 비디오 분석 응용 프로그램을 처리하는 데 중요하다. 현대의 접근 방식은 일반적으로 규칙 기반 알고리즘 대신 기울기 기반 방법으로 생체 인식 기능을 활용했다. 그러나 특정 상황에서는 단순한 기울기 기반 방법이 효율적으로 작동하지 않는다. 예를 들어, 새로운 인물이 목표 식별 리스트에 추가 될 때, 뉴럴 네트워크는 새로운 사람들을 식별하기 위해 자주 재훈련 되어야 하고 이것이 모델 준비를 지연시킨다. 이 논문에서는 기울기 업데이트 프로세스 없이 온라인 적응 기법을 통해 새로 추가된 데이터를 업데이트하여 모델 준비 시간을 줄이는 어텐션 기반 방법을 제시한다. 우리는 3 가지 평가 지표 (정확도, 메모리 사용량, 모델 준비 시간)를 통해 어텐션 기반 방법과 기존의 기울기 기반 방법들을 화자 명명 작업의 다양한 제어된 설정 하에서 비교 분석했다. 또한 기존의 화자 명명 모델과 어텐션 기반 모델을 실제 비디오에 적용한 결과 우리의 접근 방식이 기존의 최첨단 모델과 비슷한 수준의 정확도를 보여주거나, 경우에 따라 더 높은 정확도를 보여주었다.	-
dc.description.tableofcontents	Chapter 1. Introduction 1 Chapter 2. Related Work 6 2.1 Speaker Naming 6 2.2 Feature Extractors for Face and Audio Cues 6 2.2.1 Face 7 2.2.2 Audio 7 2.3 Attention Mechanism 7 Chapter 3. Methodology 9 3.1 Problem Formulation 10 3.2 Attention-Based Method for Speaker Naming 11 3.2.1 Feature Extraction 12 3.2.2 Attention Module with Few-Shot Learning 13 Chapter 4. Experiments 17 4.1 Dataset Overview 17 4.2 Data Preprocessing 17 4.3 Comparative Analysis among Speaker Naming Methods under Various Settings 18 4.3.1 Evaluation Metric 19 4.3.2 Experimental Setup 19 4.3.3 Results 21 4.4 Speaker Naming Accuracy for Real Video 24 4.4.1 Evaluation Metric 24 4.4.2 Experimental Setup 24 4.4.3 Results 25 4.5 Ablation Studies 26 Chapter 5. Conclusion and Future Work 28 5.1 Conclusion 28 5.2 Future Work 28 Bibliography 30 초록 35	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject.ddc	621.3	-
dc.title	An Attention-Based Speaker Naming Method for Online Adaptation in Non-Fixed Scenarios	-
dc.title.alternative	고정되지 않은 시나리오에서의 온라인 적응을 위한 어텐션 기반 화자 명명 기법	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Jungwoo Pyo	-
dc.contributor.department	공과대학 전기·정보공학부	-
dc.description.degree	Master	-
dc.date.awarded	2020-02	-
dc.identifier.uci	I804:11032-000000158761	-
dc.identifier.holdings	000000000042▲000000000044▲000000158761▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Master's Degree_전기·정보공학부)

Files in This Item:

000000158761.pdf 0.91 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share