가사 정보를 활용한 가창 음원 분리

전창빈

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

가사 정보를 활용한 가창 음원 분리 : Lyrics-informed Singing Voice Separation

DC Field	Value	Language
dc.contributor.advisor	이교구	-
dc.contributor.author	전창빈	-
dc.date.accessioned	2021-11-30T04:39:58Z	-
dc.date.available	2021-11-30T04:39:58Z	-
dc.date.issued	2021-02	-
dc.identifier.other	000000164430	-
dc.identifier.uri	https://hdl.handle.net/10371/175882	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000164430	ko_KR
dc.description	학위논문 (석사) -- 서울대학교 대학원 : 융합과학기술대학원 지능정보융합학과, 2021. 2. 이교구.	-
dc.description.abstract	반주와 가창이 섞인 혼합 음원에서 가창을 분리해내는 가창 음원 분리 문제는 높은 상업적 활용도를 가지고, 다양한 음악정보검색 분야 연구의 전처리 과정에 사용될 수 있기 때문에 오디오 신호 처리 분야에서 가장 활발하게 연구되고 있는 분야 중 하나이다. 기존 기계 학습 알고리즘을 사용한 연구 분야에서는 음원 이외의 부가적인 정보를 활용해 음원 분리 성능을 높이는 다양한 방법이 제안되었지만, 딥러닝을 활용한 음원 분리에서는 이러한 시도가 아직 많이 보고되지 않았다. 따라서 본 연구는 가창 음원 분리 작업에서 사용될 수 있는 가장 대표적인 부가 정보인 가사와 음정 중, 상대적으로 구하기 쉬운 가사 정보를 사용하여 음원 분리 성능을 향상시키는 방법을 제안한다. 본 연구에서는 가사가 가창의 타이밍에 맞게 정렬되어 있는 상황을 가정하고, 정렬된 가사 정보를 활용한 음원 분리 네트워크를 제안한다. 이를 위해 음악 음원 분리에서 높은 성능을 나타내는 open-unmix 네트워크와, 정렬된 가사를 보조적인 입력으로 받는 가사 인코더 네트워크를 결합한 새로운 네트워크를 제안한다. 이때 음원 분리 성능 향상이 정렬된 가사의 타이밍 정보 뿐만 아니라, 음소 정보로부터도 기인한다는 것을 정량적으로 확인 및 분석한다. 또한, 음성 합성 네트워크를 pre-train 한 다음 본 연구에서 제안한 가사 인코더 네트워크에 전이 학습 (transfer learning) 하는 기법을 통해, 제안한 네트워크의 음원 분리 성능을 더욱 높일 수 있는 방법을 제안한다. 마지막으로, 공개된 음원-텍스트 정렬 툴을 활용해 가사를 자동으로 정렬한 다음, 본 연구에서 제안한 가사 인코더 기반의 음원 분리 네트워크를 사용할 수 있음을 보인다.	-
dc.description.abstract	Singing voice separation, which refers to the task that isolating singing from mixture audio sources that are combined with singing and accompaniment, is one of the most actively studied fields in the field of audio signal processing, as it has high commercial utilization and can be used in the pre-processing of various music information retrieval tasks. In the field of research using existing machine learning algorithms, various methods have been proposed to increase the performance of singing voice separation by utilizing additional information other than sound sources, but many such attempts have yet to be reported in the singing voice separation using deep learning. Therefore, this work proposes a method to improve the singing voice separation performance by using relatively easy-to-get lyrics information, among the most representative additional information that can be used in the singing voice separation task. Therefore, we propose a singing voice separation network utilizing aligned lyrics information. To this end, we combine oepn-unmix network, which represents high performance in music source separation, and a lyrics encoder network that receives aligned lyrics as auxiliary inputs. We quantitatively confirm and analyze that the improvement in the separation performance of the singing is attributable not only to timing information of the aligned lyrics but also to phonetic information. Furthermore, we propose a method to further enhance the performance of the proposed network by pre-training the speech synthesis network and then apply transfer learning approach to the proposed lyrics encoder network in this work. Finally, we show that we can automatically align the lyrics using the released sound-text alignment tool, and then use the lyrics encoder-based singing voice separation network proposed in this study.	-
dc.description.tableofcontents	제 1장 서론 6 1.1 연구 배경 6 1.2 연구 목표 10 제 2장 배경 이론 및 관련 연구 13 2.1 배경 이론 13 2.1.1 딥러닝을 활용한 음원 분리 13 2.1.2 음원 분리 성능 평가 지표 15 2.2 관련 연구 16 2.2.1 open-unmix 네트워크 17 2.2.2 부가적인 정보를 활용한 딥러닝 기반의 음원 분리 연구 18 2.2.3 한국어 가창 합성 연구 20 2.2.4 심층 합성곱 신경망 기반 음성 합성 연구 21 2.2.5 Montreal Forced Aligner 22 제 3장 제안 기법 24 3.1 하이웨이 네트워크 기반 가사 인코더 25 3.2 open-unmix 음원 분리 네트워크와 가사 인코더의 결합 27 3.2.1 open-unmix 기준 모델 27 3.2.2 가사 임베등을 직접 이용한 가사 정보 컨디셔닝 28 3.2.3 가사 인코더 네트워크를 활용한 가사 정보 컨디셔닝 28 3.3 음성 합성 네트워크 pre-train 을 활용한 가사 인코더 전이 학습 30 3.3.1 임베딩 룩업 테이블을 포함하여 가사 인코더 fine-tuning 30 3.3.2 임베딩 룩업 테이블을 고정한 상태에서 가사 인코더 fine-tuning 31 제 4장 실험 32 4.1 실험 준비 33 4.1.1 데이터셋 33 4.1.2 실험 상세 설정 34 4.1.3 음성 합성 모델 학습 설정 36 4.4.4 Montreal Forced Aligner 학습 설정 37 4.2 실험 결과 및 토론 38 4.2.1 가사 인코더를 사용하였을 때의 성능 향상 39 4.2.2 정렬된 가사의 타이밍 정보 인코딩 성능 평가 45 4.2.3 정렬된 가사를 이용해 학습한 네트워크의 가사 정보 활용도 분석 50 4.2.4 음성 합성 네트워크로부터의 전이 학습을 사용한 가사 인코더의 성능 분석 55 4.2.5 Montreal Forced Aligner 를 활용한 가사의 정렬과 이를 활용한 가창 음원 분리 57 제 5장 결론 63 5.1 연구 의의 63 5.2 한계점 64 5.3 향후 연구 65	-
dc.format.extent	ii, 74	-
dc.language.iso	kor	-
dc.publisher	서울대학교 대학원	-
dc.subject	가창 음원 분리	-
dc.subject	가사 정보를 활용한 음원 분리	-
dc.subject	음악 음원 분리	-
dc.subject	Singing voice separation	-
dc.subject	lyrics-informed singing voice separation	-
dc.subject	music source separation	-
dc.subject.ddc	006.3	-
dc.title	가사 정보를 활용한 가창 음원 분리	-
dc.title.alternative	Lyrics-informed Singing Voice Separation	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	JEON Chang-bin	-
dc.contributor.department	융합과학기술대학원 지능정보융합학과	-
dc.description.degree	Master	-
dc.date.awarded	2021-02	-
dc.identifier.uci	I804:11032-000000164430	-
dc.identifier.holdings	000000000044▲000000000050▲000000164430▲	-

Appears in Collections:

Graduate School of Convergence Science and Technology (융합과학기술대학원)
- Dept. of Intelligence and Information (지능정보융합학과)
  - Theses (Master's Degree_지능정보융합학과)

Files in This Item:

000000164430.pdf 21.59 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share