Music Auto-tagging in Multimedia Content using Robust Music Representation Learned via Domain Adversarial Training

정해선

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Music Auto-tagging in Multimedia Content using Robust Music Representation Learned via Domain Adversarial Training : 도메인 적대적 학습을 통해 학습된 강건한 음악 표현을 사용한 멀티미디어 콘텐츠에서의 음악 자동 태깅

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 정해선

Advisor: 이교구

Issue Date: 2023

Publisher: 서울대학교 대학원

Keywords: Robust Music Representation ; Music Auto-tagging ; Domain Adversarial Training

Description: 학위논문(석사) -- 서울대학교대학원 : 융합과학기술대학원 지능정보융합학과, 2023. 8. 이교구.

Abstract: Music auto-tagging plays a vital role in music discovery and recommendation by assigning relevant tags or labels to music tracks. However, existing models in the field of Music Information Retrieval (MIR) often struggle to maintain high performance when faced with real-world noise, such as environmental noise and speech commonly found in multimedia content like YouTube videos.

In this research, we draw inspiration from previous studies focused on speech- related tasks and propose a novel approach to improve the performance of music auto-tagging on noisy sources. Our method incorporates Domain Adversarial Training (DAT) into the music domain, enabling the learning of robust music representations that are resilient to the presence of noise. Unlike previous speech-based research, which typically involves a pretraining phase for the feature extractor followed by the DAT phase, our approach includes an additional pretraining phase specifically designed for the domain classifier. By this additional training phase, the domain classifier effectively distinguishes between clean and noisy music sources, enhancing the feature extractors ability not to distinguish between clean and noisy music.

Furthermore, we introduce the concept of creating noisy music source data with varying signal-to-noise ratios. By exposing the model to different levels of noise, we promote better generalization across diverse environmental conditions. This enables the model to adapt to a wide range of real-world scenarios and perform robust music auto-tagging.

Our proposed network architecture demonstrates exceptional performance in music auto-tagging tasks, leveraging the power of robust music representations even on noise types that were not encountered during the training phase. This highlights the models ability to generalize well to unseen noise sources, further enhancing its effectiveness in real-world applications.

Through this research, we address the limitations of existing music auto-tagging models and present a novel approach that significantly improves performance in the presence of noise. The findings of this study contribute to the advancement of music processing applications, enabling more accurate and reliable music classification and organization in various industries.
음악 자동 태깅(Music auto-tagging)은 음악 오디오에 관련 태그(tag) 또는 레이블(label)을 할당하여 음악 검색 및 추천에 중요한 역할을 한다. 그러나 음악 정보 검색(MIR) 분야의 기존 모델은 유튜브 비디오와 같은 멀티미디어 콘텐츠에서 일반적으로 발견되는 환경 소음 및 음성과 같은 실제 소음에 직면할 때 성능 저하를 마주한다.

본 연구에서는 강건한 음성 표현 학습 방법 중점을 둔 이전 연구에서 영감을 얻어, 소음(noise)이 많은 소스에서 음악 자동 태깅의 성능을 향상시키기 위한 새로운 접근 방식을 제안한다. 우리의 방법은 도메인 적대적 훈련(domain adversarial training, DAT)을 사용하여 소음의 존재에 탄력적인, 강건한 음악 표현을 학습할 수 있도록 한다. 일반적으로 특징 추출기(feature extractor)에 대한 사전 훈련 단계에 이어 DAT 단계를 포함하는 이전의 음성 기반 연구와 달리, 우리의 접근 방식은 도메인 분류기를 위해 특별히 설계된 추가적인 사전 훈련 단계를 포함한다. 이 학습 단계를 통해 도메인 분류기는 깨끗한 음악 소스와 시끄러운 음악 소스를 효과적으로 구별하여 특징 추출기의 깨끗한 음악과 시끄러운 음악을 구별하지 않는 능력을 향상시킨다.

또한, 우리는 다양한 신호 대 잡음 비(signal-to-noise ratio, SNR)로 소음이 많은 음악 소스 데이터를 생성하는 개념을 소개한다. 모델을 다양한 수준의 소음에 노출시킴으로써 다양한 환경 조건에서 더 나은 일반화(generalization)를 촉진한다. 이를 통해 모델은 광범위한 실제 상황과 소리에 적응하고, 강력한 음악 자동 태깅을 수행할 수 있다.

우리가 제안한 구조는 음악 자동 태깅 작업에서 탁월한 성능을 보여주며, 훈련 단계에서 마주치지 않은 소음 유형에 대해서도 강건한 음악 표현을 추출한다. 이는 마주하지 않았던 소음에 대해서 잘 일반화할 수 있는 모델의 능력을 강조하여, 실제 상황에서의 효과를 더욱 향상시킨다.

이 연구를 통해 기존 음악 자동 태깅 모델의 한계를 해결하고 소음이 있는 상황에서 성능을 크게 향상시키는 새로운 접근 방식을 제시한다. 본 연구의 결과는 음악 정보 검색 분야의 발전에 기여하여 다양한 산업에서 보다 정확하고 신뢰할 수 있는 음악 분류 및 구성을 가능하게 한다.

Language: eng

URI: https://hdl.handle.net/10371/197064

https://dcollection.snu.ac.kr/common/orgView/000000177784

Files in This Item:

000000177784.pdf 11.61 MB

Appears in Collections:

Graduate School of Convergence Science and Technology (융합과학기술대학원)
- Dept. of Intelligence and Information (지능정보융합학과)
  - Theses (Master's Degree_지능정보융합학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share