데이터 증강을 통한 순차 추천 성능 향상 연구

Abstract: Recommender systems, which have been developed based on the maturity of the information society, have recently achieved high performance improvement with the development of Deep Learning technology that makes use of artificial deep neural network. Owing to the improved performance, its influence as well as its importance have been increasing today. Generally, it is known that the recommender models based on neural network structure can achieve higher performance than those with traditional approaches, but it premises a large volume of data in order to estimate numerous parameters of the model. However, it is quite expensive to obtain additional data when the size of dataset being in hand is small. Especially, recommender systems have more difficulty in obtaining additional data compared to other areas in that they deal with actual user behavior logs.
In the field of Computer Vision, where Deep Learning technology is being actively applied, it has been seeking to improve the performance of the model by greatly increasing the volume of training data using various data augmentation technologies. It is also expected that recommender systems can have significant benefits using these data augmentation technologies, but the research on data augmentation has not been sufficiently progressed yet in the recommender systems, especially in the field of sequential recommendation.
Motivated by this situation, we aim to show that various data augmentation techniques can improve the performance of sequential recommender system, especially when the training dataset is small. To this end, we describe how data augmentation changes the performance with the extensive experiment based on the latest sequential recommender model of neural network architecture. Our suggested data augmentation techniques are 1) Noise Injection, 2) Redundancy Injection, 3) Masking, 4) Synonym Substitution, all of which transform original item sequences in the way of direct corruption. Experiments performed with three benchmark datasets - Movielens-1M, Gowalla and Amazon Video Games - demonstrate that overall performance improvement can be found when our suggested data augmentation techniques applied. Its notable that the performance improvement can be large if the size of dataset is relatively small. It can be said that applying data augmentation techniques can be effective to boost the performance, especially when the sequential recommender system doesnt have enough dataset on the early stage of a service.
The contributions of this study can be summarized as follows: 1) it has confirmed with quantitative experiments that data augmentation technology can improve sequential recommendation performance when training dataset is small, 2) it suggests the possibility of further performance improvement of other current SOTA(State-Of-The-Art) models, in that data augmentation is applied in the pre-processing step and does not change the overall model architecture, 3) it has described how the performance changes in different manner according to the extensive application of various data augmentation techniques, and it furthermore verified that data augmentation can work as an universal pre-processing technique in the design of recommender system, 4) it can be referred as a basic research result in the process of developing the research on data augmentation in recommender systems, which has not yet been fully investigated.
정보화 시대의 성숙을 바탕으로 발달해 온 추천 시스템은 최근 심층 인공신경망(Deep Neural Network) 구조를 학습에 활용하는 딥러닝(Deep Learning) 기술의 발달과 함께 높은 성능 향상을 이루어내고 있으며, 개선된 성능에 힘입어 그 영향력 및 중요성이 더욱 증가하고 있다. 많은 경우에 심층 인공신경망 구조를 활용한 모델은 전통적인 접근법 대비 높은 성능을 얻을 수 있는 것으로 알려져 있으나, 이를 위해서는 수많은 파라미터들을 추정하기 위한 대규모의 데이터셋이 전제된다. 하지만 기 확보된 데이터셋의 규모가 작은 상황에서 데이터를 추가적으로 확보하는 것은 높은 비용이 소요되며, 특히 실제 사용자의 행동 로그를 다루는 추천 시스템은 타 분야에 비해 추가 데이터 확보가 더욱 어려운 편이다.
딥러닝 기술의 적용이 활발하게 이루어지고 있는 컴퓨터 비전(Computer Vision) 분야의 경우 다양한 데이터 증강(Data Augmentation) 기술을 통해 절대적인 학습 데이터셋의 규모를 크게 늘리는 방식으로 모델의 성능 향상을 도모해오고 있다. 추천 시스템 역시 이러한 데이터 증강 기술을 활용하여 학습 데이터를 추가적으로 확보할 수 있다면 상당한 효용이 있을 것으로 기대되나, 아직까지 추천 시스템, 특히 순차 추천(Sequential recommendation) 분야에서는 데이터 증강 관련 연구가 미흡한 실정이다.
이러한 문제의식을 바탕으로, 본 논문에서는 학습 데이터가 충분하지 않은 상황에서 다양한 데이터 증강 기법을 활용함으로써 순차 추천 시스템의 성능을 향상시킬 수 있음을 보이고자 한다. 이를 위해 인공신경망 기반의 최신 순차 추천 모델을 기반으로 원본 아이템 시퀀스를 직접 변형시키는(direct corruption) 방식의 총 4가지 데이터 증강 기법 1) 노이즈 추가(Noise Injection), 2) 중복성 추가 (Redundancy Injection), 3) 아이템 마스킹(Masking), 4) 유사 아이템 대체(Synonym Substitution) 을 적용하여 추천 성능의 변화를 확인하였다. 순차 추천 모델의 성능 평가를 위한 벤치마크로 널리 사용되는 무비렌즈(MovieLens-1M), 고왈라(Gowalla), 아마존 비디오게임(Amazon Video Games) 의 3개 데이터셋에 대해 실험을 진행한 결과, 제안하는 데이터 증강 기술을 적용할 경우 전반적으로 성능 개선이 나타나는 것으로 확인되었다. 특히 데이터셋의 크기가 작을 때에 데이터 증강에 의한 성능 향상의 폭이 큰 것으로 나타나, 순차 추천 시스템이 적용되는 서비스 초창기에 데이터를 충분히 확보하지 못했을 경우 성능 향상을 위해 데이터 증강이 효과적으로 활용될 수 있을 것으로 보인다.
본 연구의 기여 부분은 다음과 같이 정리할 수 있다. 1) 데이터 증강 기술을 활용하여 학습 데이터가 적은 상황에서 순차 추천 성능을 개선할 수 있음을 정량적인 실험을 통해 확인하였다. 2) 데이터 전처리 과정에서 이루어지는 데이터 증강은 기본적인 모델의 아키텍처를 변화시키지 않는다는 점에서 현재 제시되어 있는 다양한 SOTA(State-Of-The-Art) 모델의 성능을 추가적으로 개선시킬 수 있는 가능성을 제시하였다. 3) 다양한 데이터 증강 방식을 포괄적으로 적용해 봄으로써 데이터 증강 방식에 따른 성능 변화의 차별적 양상을 확인하였으며, 데이터 증강이 추천 시스템 디자인에 있어 보편적인 전처리 기법의 하나로 기능할 수 있을 가능성을 검증하였다. 4) 기존의 순차 추천 분야에서 아직 충분히 연구되지 않은 데이터 증강이라는 측면에 주목함으로써 향후 관련 연구들을 발전시키는 과정에서 기초 자료로 활용될 수 있다.

Language: kor

URI: https://hdl.handle.net/10371/175884

https://dcollection.snu.ac.kr/common/orgView/000000164664

Files in This Item:

000000164664.pdf 2.69 MB

Appears in Collections:

Graduate School of Convergence Science and Technology (융합과학기술대학원)
- Dept. of Intelligence and Information (지능정보융합학과)
  - Theses (Master's Degree_지능정보융합학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share