Automatic Music Lead Sheet Transcription and Melody Similarity Assessment Using Deep Neural Networks

박종권

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Automatic Music Lead Sheet Transcription and Melody Similarity Assessment Using Deep Neural Networks : 심층 신경망 기반의 음악 리드 시트 자동 채보 및 멜로디 유사도 평가

DC Field	Value	Language
dc.contributor.advisor	이경식	-
dc.contributor.author	박종권	-
dc.date.accessioned	2023-06-29T01:52:10Z	-
dc.date.available	2023-06-29T01:52:10Z	-
dc.date.issued	2023	-
dc.identifier.other	000000174561	-
dc.identifier.uri	https://hdl.handle.net/10371/193130	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000174561	ko_KR
dc.description	학위논문(박사) -- 서울대학교대학원 : 공과대학 산업공학과, 2023. 2. 이경식.	-
dc.description.abstract	Since the composition, arrangement, and distribution of music became convenient thanks to the digitization of the music industry, the number of newly supplied music recordings is increasing. Recently, due to platform environments being established whereby anyone can become a creator, user-created music such as their songs, cover songs, and remixes is being distributed through YouTube and TikTok. With such a large volume of musical recordings, the demand to transcribe music into sheet music has always existed for musicians. However, it requires musical knowledge and is time-consuming. This thesis studies automatic lead sheet transcription using deep neural networks. The development of transcription artificial intelligence (AI) can greatly reduce the time and cost for people in the music industry to find or transcribe sheet music. In addition, since the conversion from music sources to the form of digital music is possible, the applications could be expanded, such as music plagiarism detection and music composition AI. The thesis first proposes a model recognizing chords from audio signals. Chord recognition is an important task in music information retrieval since chords are highly abstract and descriptive features of music. We utilize a self-attention mechanism for chord recognition to focus on certain regions of chords. Through an attention map analysis, we visualize how attention is performed. It turns out that the model is able to divide segments of chords by utilizing the adaptive receptive field of the attention mechanism. This thesis proposes a note-level singing melody transcription model using sequence-to-sequence transformers. Overlapping decoding is introduced to solve the problem of the context between segments being broken. Applying pitch augmentation and adding a noisy dataset with data cleansing turns out to be effective in preventing overfitting and generalizing the model performance. Ablation studies demonstrate the effects of the proposed techniques in note-level singing melody transcription, both quantitatively and qualitatively. The proposed model outperforms other models in note-level singing melody transcription performance for all the metrics considered. Finally, subjective human evaluation demonstrates that the results of the proposed models are perceived as more accurate than the results of a previous study. Utilizing the above research results, we introduce the entire process of an automatic music lead sheet transcription. By combining various music information recognized from audio signals, we show that it is possible to transcribe lead sheets that express the core of popular music. Furthermore, we compare the results with lead sheets transcribed by musicians. Finally, we propose a melody similarity assessment method based on self-supervised learning by applying the automatic lead sheet transcription. We present convolutional neural networks that express the melody of lead sheet transcription results in embedding space. To apply self-supervised learning, we introduce methods of generating training data by musical data augmentation techniques. Furthermore, a loss function is presented to utilize the training data. Experimental results demonstrate that the proposed model is able to detect similar melodies of popular music from plagiarism and cover song cases.	-
dc.description.abstract	음악 산업의 디지털화를 통해 음악의 작곡, 편곡 및 유통이 편리해졌기 때문에 새롭게 공급되는 음원의 수가 증가하고 있다. 최근에는 누구나 크리에이터가 될 수 있는 플랫폼 환경이 구축되어, 사용자가 만든 자작곡, 커버곡, 리믹스 등이 유튜브, 틱톡을 통해 유통되고 있다. 이렇게 많은 양의 음악에 대해, 음악을 악보로 채보하고자 하는 수요는 음악가들에게 항상 존재했다. 그러나 악보 채보에는 음악적 지식이 필요하고, 시간과 비용이 많이 소요된다는 문제점이 있다. 본 논문에서는 심층 신경망을 활용하여 음악 리드 시트 악보 자동 채보 기법을 연구한다. 채보 인공지능의 개발은 음악 종사자 및 연주자들이 악보를 구하거나 만들기 위해 소모하는 시간과 비용을 크게 줄여 줄 수 있다. 또한 음원에서 디지털 악보 형태로 변환이 가능해지므로, 자동 표절 탐지, 작곡 인공지능 학습 등 다양하게 활용이 가능하다. 리드 시트 채보를 위해, 먼저 오디오 신호로부터 코드를 인식하는 모델을 제안한다. 음악에서 코드는 함축적이고 표현적인 음악의 중요한 특징이므로 이를 인식하는 것은 매우 중요하다. 코드 구간 인식을 위해, 어텐션 매커니즘을 이용하는 트랜스포머 기반 모델을 제시한다. 어텐션 지도 분석을 통해, 어텐션이 실제로 어떻게 적용되는지 시각화하고, 모델이 코드의 구간을 나누고 인식하는 과정을 살펴본다. 그리고 시퀀스 투 시퀀스 트랜스포머를 이용한 음표 수준의 가창 멜로디 채보 모델을 제안한다. 디코딩 과정에서 각 구간 사이의 문맥 정보가 단절되는 문제를 해결하기 위해 중첩 디코딩을 도입한다. 데이터 변형 기법으로 음높이 변형을 적용하는 방법과 데이터 클렌징을 통해 학습 데이터를 추가하는 방법을 소개한다. 정량 및 정성적인 비교를 통해 제안한 기법들이 성능 개선에 도움이 되는 것을 확인하였고, 제안모델이 MIR-ST500 데이터 셋에 대한 음표 수준의 가창 멜로디 채보 성능에서 가장 우수한 성능을 보였다. 추가로 주관적인 사람의 평가에서 제안 모델의 채보 결과가 이전 모델보다 저 정확하다고 인식됨을 확인하였다. 앞의 연구의 결과를 활용하여, 음악 리드 시트 자동 채보의 전체 과정을 제시한다. 오디오 신호로부터 인식한 다양한 음악 정보를 종합하여, 대중 음악 오디오 신호의 핵심을 표현하는 리드 시트 악보 채보가 가능함을 보인다. 그리고 이를 전문가가 제작한 리드시트와 비교하여 분석한다. 마지막으로 리드 시트 악보 자동 채보 기법을 응용하여, 자기 지도 학습 기반 멜로디 유사도 평가 방법을 제안한다. 리드 시트 채보 결과의 멜로디를 임베딩 공간에 표현하는 합성곱 신경망 모델을 제시한다. 자기지도 학습 방법론을 적용하기 위해, 음악적 데이터 변형 기법을 적용하여 학습 데이터를 생성하는 방법을 제안한다. 그리고 준비된 학습 데이터를 활용하는 심층 거리 학습 손실함수를 설계한다. 실험 결과 분석을 통해, 제안 모델이 표절 및 커버송 케이스에서 대중음악의 유사한 멜로디를 탐지할 수 있음을 확인한다.	-
dc.description.tableofcontents	Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Objectives 4 1.3 Thesis Outline 6 Chapter 2 Literature Review 7 2.1 Attention Mechanism and Transformers 7 2.1.1 Attention-based Models 7 2.1.2 Transformers with Musical Event Sequence 8 2.2 Chord Recognition 11 2.3 Note-level Singing Melody Transcription 13 2.4 Musical Key Estimation 15 2.5 Beat Tracking 17 2.6 Music Plagiarism Detection and Cover Song Identi cation 19 2.7 Deep Metric Learning and Triplet Loss 21 Chapter 3 Problem De nition 23 3.1 Lead Sheet Transcription 23 3.1.1 Chord Recognition 24 3.1.2 Singing Melody Transcription 25 3.1.3 Post-processing for Lead Sheet Representation 26 3.2 Melody Similarity Assessment 28 Chapter 4 A Bi-directional Transformer for Musical Chord Recognition 29 4.1 Methodology 29 4.1.1 Model Architecture 29 4.1.2 Self-attention in Chord Recognition 33 4.2 Experiments 35 4.2.1 Datasets 35 4.2.2 Preprocessing 35 4.2.3 Evaluation Metrics 36 4.2.4 Training 37 4.3 Results 38 4.3.1 Quantitative Evaluation 38 4.3.2 Attention Map Analysis 41 Chapter 5 Note-level Singing Melody Transcription 44 5.1 Methodology 44 5.1.1 Monophonic Note Event Sequence 44 5.1.2 Audio Features 45 5.1.3 Model Architecture 46 5.1.4 Autoregressive Decoding and Monophonic Masking 47 5.1.5 Overlapping Decoding 47 5.1.6 Pitch Augmentation 49 5.1.7 Adding Noisy Dataset with Data Cleansing 50 5.2 Experiments 51 5.2.1 Dataset 51 5.2.2 Experiment Con gurations 52 5.2.3 Evaluation Metrics 53 5.2.4 Comparison Models 54 5.2.5 Human Evaluation 55 5.3 Results 56 5.3.1 Ablation Study 56 5.3.2 Note-level Transcription Model Comparison 59 5.3.3 Transcription Performance Distribution Analysis 59 5.3.4 Fundamental Frequency (F0) Metric Evaluation 60 5.4 Qualitative Analysis 62 5.4.1 Visualization of Ablation Study 62 5.4.2 Spectrogram Analysis 65 5.4.3 Human Evaluation 67 Chapter 6 Automatic Music Lead Sheet Transcription 68 6.1 Post-processing for Lead Sheet Representation 68 6.2 Lead Sheet Transcription Results 71 Chapter 7 Melody Similarity Assessment with Self-supervised Convolutional Neural Networks 77 7.1 Methodology 77 7.1.1 Input Data Representation 77 7.1.2 Data Augmentation 78 7.1.3 Model Architecture 82 7.1.4 Loss Function 84 7.1.5 De nition of Distance between Songs 85 7.2 Experiments 87 7.2.1 Dataset 87 7.2.2 Training 88 7.2.3 Evaluation Metrics 88 7.3 Results 89 7.3.1 Quantitative Evaluation 89 7.3.2 Qualitative Evaluation 99 Chapter 8 Conclusion 107 8.1 Summary and Contributions 107 8.2 Limitations and Future Research 110 Bibliography 111 국문초록 126	-
dc.format.extent	xv, 127	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	Music Information Retrieval	-
dc.subject	Automatic Music Transcription	-
dc.subject	Chord Recognition	-
dc.subject	Singing Melody Transcription	-
dc.subject	Melody Similarity Assessment	-
dc.subject	Music Plagiarism Detection	-
dc.subject	Self-supervised Learning	-
dc.subject	Deep Neural Networks	-
dc.subject	음악 정보 검색	-
dc.subject	음악 자동 채보	-
dc.subject	화음 인식	-
dc.subject	가창 멜로디 인식	-
dc.subject	멜로디 유사도 평가	-
dc.subject	음악 표절 탐지	-
dc.subject	자기지도 학습	-
dc.subject	심층신경망	-
dc.subject.ddc	670.42	-
dc.title	Automatic Music Lead Sheet Transcription and Melody Similarity Assessment Using Deep Neural Networks	-
dc.title.alternative	심층 신경망 기반의 음악 리드 시트 자동 채보 및 멜로디 유사도 평가	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Jonggwon Park	-
dc.contributor.department	공과대학 산업공학과	-
dc.description.degree	박사	-
dc.date.awarded	2023-02	-
dc.identifier.uci	I804:11032-000000174561	-
dc.identifier.holdings	000000000049▲000000000056▲000000174561▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Industrial Engineering (산업공학과)
  - Theses (Ph.D. / Sc.D._산업공학과)

Files in This Item:

000000174561.pdf 9.99 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share