음악 추천을 위한 가사정보 및 음악신호 기반 특성 탐색 연구

이승진

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

음악 추천을 위한 가사정보 및 음악신호 기반 특성 탐색 연구 : A study of content-based feature exploration using lyrics and music signals for music recommendation

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 이승진

Advisor: 이교구

Major: 융합과학기술대학원 융합과학부(디지털정보융합전공)

Issue Date: 2019-02

Publisher: 서울대학교 대학원

Description: 학위논문 (석사)-- 서울대학교 대학원 : 융합과학기술대학원 융합과학부(디지털정보융합전공), 2019. 2. 이교구.

Abstract: 기술적 진보로 인하여 콘텐츠 소비 시장의 트렌드는 디지털화, 모바일화 되었고 이용자들은 더 이상 시간과 공간의 제약없이 콘텐츠를 감상하는 것이 가능해졌다. 높아진 접근성은 콘텐츠 선택의 어려움을 야기하였고 음악, 영상, 책, 논문, 뉴스 등 다양한 분야에서 사용자가 좋아할 만한 항목을 선별해주는 추천 시스템에 대한 요구는 높아졌다. 특히 음악의 경우, 하루 평균 발매되는 음원의 수가 수만 곡에 이르며 이를 모두 듣고 선호하는 음악을 탐색하는 것은 불가능에 가까워졌다.

이러한 배경에서 다양한 추천 알고리즘이 등장하였고 그 중 협업 필터링 알고리즘은 상업적으로 널리 활용되며 그 실효성을 입증했지만, 희소성 문제 및 콜드스타트 문제와 같은 피드백 정보의 특징에서 오는 한계점과 아이템의 내용적 유사성을 반영할 수 없다는 한계점이 존재한다. 특히 새롭게 발매된 음악과 같이 사용자 피드백 정보가 존재 하지 않는 경우 협업 필터링 방법을 사용할 수 없다. 이러한 한계점을 극복하기 위해 음악 신호에 딥러닝을 활용하여 곡의 평점을 예측하는 내용기반 추천 방법이 연구되었다.

하지만 해당 방법은 오디오 신호 만을 입력으로 사용하기에 음악적 분위기를 비롯한 사용자 선호에 영향을 미치는 서정적, 의미론적 특성을 반영하지 못하는 한계점이 존재한다. 본 연구에서는 오디오 신호에 분위기와 같은 음악의 서정적, 의미론적 특징을 반영하는 가사 정보를 추가해 이를 보완하고자 한다. 이를 위해 문단 벡터 모델, 자기 주의 집중 방법론을 활용해 다양한 음악적 유사성을 반영하는 특성을 추출하는 연구를 진행한다. 추출한 특성들로 음악 추천을 위한 학습 데이터를 구성하였으며, 합성곱 신경망 모델(CNN)을 적용하여 입력 특성과 평점 특성간의 관계를 형성하는 네트워크를 학습한다.

제안하는 방법은 222,780명의 사용자의 청취 이력으로 구성된 테스트 데이터에서 0.06의 mAP, 0.151의 평균 NDCG를 달성하였다. 0.045의 mAP, 0.139의 평균 NDCG를 달성한 기존 방법과 비교하여 사용자 1,000명의 샘플을 추출하여 t-검정을 수행한 결과 통계적으로 유의미한 성능 차이를 보였다. 이를 통해 피드백 정보가 존재하지 않는 완전한 콜드스타트 상황에서의 제안하는 특성이 기존 방식에 추가로 활용될 수 있는 가능성을 확인하였다. 또한 추출한 특성을 활용하여 연관곡 검색을 수행한 결과, 다양한 음악적 유사성을 반영하고 추천의 결과의 해석가능성을 제시하는 새로운 추천 시스템에 대한 가능성을 탐색하였다.
In this study, we utilize the lyrics reflecting the semantic characteristics of music such as the atmosphere or mood with audio signal. Lyrics contain a wealth of information about the content, mood, genre, and style of music, as they contain a variety of words that reflect the emotion and mood of the music. Using a multimodal network combining lyrics and audio signals, we predict the latent item matrix decomposed by weighted matrix factorization. As inputs of networks, we propose two features containing lyrics information. In order to reflect the information of symbolic genre tags, pre-trained genre classification model with self-attention was applied to extract feature. We found the usefulness of the lyrics in music retrieval through similar song exploration experiments and visualization of attention weights.

To evaluate our method, we use playcount of 222,780 users for 16,000 songs. We calculate three different evaluation metrics, mAP, NDCG, recall@N. As a result of comparing the proposed method with the existing algorithm, the error of the matrix completion (Mean Squared Error) is decreased. And the methods using proposed features outperformed the existing method in terms of mAP, NDCG, recall@N. It is confirmed that the proposed method can replace the existing method in the case where play count or rating information does not exist. In this study, we have discovered new characteristics that can be effectively utilized in music retrieval and recommendation engines.

Language: kor

URI: https://hdl.handle.net/10371/151420

Files in This Item:

000000154848.pdf 4.74 MB

Appears in Collections:

Graduate School of Convergence Science and Technology (융합과학기술대학원)
- Dept. of Transdisciplinary Studies(융합과학부)
  - Theses (Master's Degree_융합과학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share