Musical Instrument Identification and Tone Detection Using Feature Learning

한윤창

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Musical Instrument Identification and Tone Detection Using Feature Learning : 특징 학습을 통한 악기 식별과 음색 분류

DC Field	Value	Language
dc.contributor.advisor	이교구	-
dc.contributor.author	한윤창	-
dc.date.accessioned	2017-07-14T01:49:16Z	-
dc.date.available	2017-07-14T01:49:16Z	-
dc.date.issued	2017-02	-
dc.identifier.other	000000142580	-
dc.identifier.uri	https://hdl.handle.net/10371/122373	-
dc.description	학위논문 (박사)-- 서울대학교 대학원 : 융합과학부, 2017. 2. 이교구.	-
dc.description.abstract	In music information retrieval (MIR) field, most of the tasks have been heavily relying on the hand-crafted features which are highly useful to measure and quantify the certain target characteristics of the sound such as pitch, roughness, and brightness. However, there are an increasing number of attempts on applying feature learning technique, which has shown superior performance across the research fields, in MIR tasks especially when the goal is the identification of the sound. The aim of this thesis is to advance the state-of-the-art in musical instrument identification and tone detection tasks with feature learning approaches, which can be used for various music-related applications, including but not limited to, music searching, browsing, recommendation, and education. We utilize sparse feature learning and convolutional neural network to learn a feature from input data, and propose the network architecture and data processing framework that is suitable for music signal. We present experimental results on MIR tasks such as fingering detection of overblown flute sound, instrument identification in monophonic sound, and predominant instrument recognition in a real-world polyphonic music. In addition, we conducted an extensive experiment to find the optimal data processing techniques and parameters such as input frame sampling, frequency scaling, activation pooling, window/hop size, output aggregation method, and network training hyperparameters.	-
dc.description.tableofcontents	Chapter 1 Introduction 1 1.1 Music Information Retrieval 2 1.2 Musical Instrument Identification and Tone Detection 5 1.3 Related Works 7 1.4 Tasks of Interest 9 1.5 Contributions 11 Chapter 2 Overview of MIR Systems for Identification Tasks 16 2.1 Input Data Representation 17 2.1.1 Time Domain Representation 17 2.1.2 Time-frequency Representation 18 2.1.3 Cepstral Domain Representation 19 2.2 Feature Extraction 20 2.2.1 Conventional Hand-crafted Features 21 2.2.2 Feature Learning Approaches 22 2.3 Preprocessing 24 2.4 Feature Learning Algorithms 25 2.4.1 Sparse Feature Learning 26 2.4.2 Convolutional Neural Network 28 Chapter 3 Detecting Fingering of Overblown Flute Sound 30 3.1 Introduction 30 3.2 Existing Approaches 33 3.3 Spectral Characteristics of Overblown Tone 33 3.4 System Architecture 35 3.4.1 Preprocessing 36 3.4.2 Feature Learning Strategy 37 3.4.3 Max-pooling 38 3.4.4 Classification 38 3.5 Evaluation 39 3.5.1 Dataset Specification 39 3.5.2 Experiment Settings 40 3.6 Results 42 3.6.1 Comparison with MFCCs 42 3.6.2 Effect of Max-pooling 44 3.6.3 Effect of the Number of Hidden Units 44 3.6.4 Effect of Classifier 45 3.6.5 Comparison with PCA/LDA method 46 3.7 Conclusions 48 Chapter 4 Instrument Identification in Monophonic Sound 50 4.1 Introduction 50 4.2 Data Processing Pipeline 52 4.2.1 Preprocessing 52 4.2.2 Frame Sampling Methods for Dictionary Learning 53 4.2.3 Activation Pooling 55 4.2.4 Classification 57 4.3 Evaluation 58 4.3.1 Dataset Specification 58 4.3.2 Experiment Settings 59 4.4 Results 61 4.4.1 Effect of the Sampling Method 61 4.4.2 Effect of the Pooling Method 62 4.4.3 Effect of the DFT size 65 4.4.4 Effect of the Dictionary Size 67 4.4.5 Effect of Frequency Scaling 68 4.4.6 Comparison to MFCCs 68 4.5 Discussion 68 4.5.1 Sampling and Pooling Method 68 4.5.2 DFT Size and the Dictionary Size 70 4.5.3 Frequency Scaling and Comparison to MFCCs 70 4.5.4 Comparison to Human Performance and Limitation of Research 71 4.6 Conclusions 72 Chapter 5 Predominant Instrument Recognition in Polyphonic Music 75 5.1 Introduction 75 5.2 Proliferation of Deep Neural Networks in Music Information Retrieval 77 5.3 System Architecture 79 5.3.1 Audio Preprocessing 79 5.3.2 Network Architecture 80 5.3.3 Training Configuration 82 5.3.4 Activation Function 83 5.4 Evaluation 85 5.4.1 IRMAS Dataset 85 5.4.2 Testing Configuration 86 5.4.3 Performance Evaluation 89 5.5 Results 90 5.5.1 Effect of Analysis Window Size 91 5.5.2 Effect of Max-pooling and Model Ensemble 92 5.5.3 Effect of Activation Function 95 5.5.4 Comparison to Existing Algorithms 96 5.5.5 Analysis of Instrument-Wise Identification Performance 98 5.5.6 Analysis on Single Predominant Instrument Identification 100 5.5.7 Qualitative Analysis with Visualization Methods 102 5.6 Conclusions 105 Chapter 6 Conclusions and Future Work 109 초록 134	-
dc.format	application/pdf	-
dc.format.extent	6503772 bytes	-
dc.format.medium	application/pdf	-
dc.language.iso	en	-
dc.publisher	서울대학교 대학원	-
dc.subject	music information retrieval	-
dc.subject	deep learning	-
dc.subject	convolutional neural network	-
dc.subject	sparse feature learning	-
dc.subject	musical instrument identification	-
dc.subject.ddc	620	-
dc.title	Musical Instrument Identification and Tone Detection Using Feature Learning	-
dc.title.alternative	특징 학습을 통한 악기 식별과 음색 분류	-
dc.type	Thesis	-
dc.contributor.AlternativeAuthor	Yoonchang Han	-
dc.description.degree	Doctor	-
dc.citation.pages	135	-
dc.contributor.affiliation	융합과학기술대학원 융합과학부	-
dc.date.awarded	2017-02	-

Appears in Collections:

Graduate School of Convergence Science and Technology (융합과학기술대학원)
- Dept. of Transdisciplinary Studies(융합과학부)
  - Theses (Ph.D. / Sc.D._융합과학부)

Files in This Item:

000000142580.pdf 6.20 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share