NLP 모델을 이용한 KOSPI 키워드집합 확장 및 키워드 검색량을 활용한 KOSPI 예측

고우진

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

NLP 모델을 이용한 KOSPI 키워드집합 확장 및 키워드 검색량을 활용한 KOSPI 예측 : KOSPI Keyword Set Expansion using NLP Model & KOSPI Prediction using Keyword Search Index

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 고우진

Advisor: 오희석

Issue Date: 2023

Publisher: 서울대학교 대학원

Keywords: KOSPI ; 자연어처리 ; 키워드추출 ; 주가예측 ; 검색량 ; 딥러닝

Description: 학위논문(석사) -- 서울대학교대학원 : 자연과학대학 통계학과, 2023. 8. 오희석.

Abstract: This paper consists of two study topics.

First, we study the expansion of KOSPI keyword set using NLP model. Starting from KOSPI related 159 original seed keywords, we collect text and candidate keywords. Then using NLP models KLUE-RoBERTa/large, KPF-BERT and KB-ALBERT, we vectorize seed keywords, text and candidate keywords. Calculating cosine similarity between embedding vectors, we get similarity/importance score and by utilizing these scores at 1st screening we extract 828 generated keywords. Also, by proceeding 2nd screening on the basis of correlation between search index of generated keywords and KOSPI, we get keyword set highly related with KOSPI.

Next, we study KOSPI prediction using search index of KOSPI related keywords. As a prediction model we use LSTM. We conduct experiments in various length setting of training data from 1 year to 7 year and try to find the best combination of hyperparameter for each length of data. As a result of experiment, we could find that KOSPI prediction performance of using search index of newly generated keywords is better than using search index of seed keywords, which means keyword set expansion task was well carried out. In addition, an ensemble model which combined 7 different models for each length of training data shows much improved prediction performance. It implies giving a large weight to the data which is close to the day we want to predict is useful.
본 논문은 두개의 연구 주제로 구성되어 있다.

먼저 NLP 모델을 이용한 KOSPI 관련 키워드 집합 확장을 연구한 다. 처음 설정한 KOSPI 관련 159개의 original seed keyword로부터 시작해 text와 candidate keyword를 수집하고 KLUE-RoBERTa/large, KPF-BERT, KB-ALBERT의 NLP 모델을 활용하여 벡터화를 진행한다. 임베딩 벡터들간의 코사인 유사도를 계산하여 similarity/importance score를 구하고 1차 스크리닝에 활용해 828개의 generated keyword를 생성한다. 또한 generated keyword의 검색량과 KOSPI의 상관관계를 기준으로 2차 스크리닝을 진행해 KOSPI와 관련성 있는 키워드 집합을 얻는다.

다음으로 확장된 KOSPI 관련 키워드의 검색량을 활용하여 KOSPI 예측을 연구한다. 예측모형으로는 LSTM을 활용하며 학습데이터의 기간 구조를 1~7년으로 다양하게 설정해 실험을 진행하였고 각 기간에 맞는 최적의 하이퍼파라미터 조합을 찾으려 했다. 실험결과 seed keyword 대비 새롭게 확장된 KOSPI 키워드의 검색량을 사용하였을 때 KOSPI 예측 성능이 더 우수함을 확인하였고 이를 통해 키워드 집합 확장 태스크가 잘 수행되었음을 알 수 있었다. 또한 기간별 7개 모델을 앙상블하여 예측하고자 하는 날과 가까운 시점에 큰 가중치를 준 앙상블모델이 향상된 예측 성능을 보이는 것을 확인할 수 있었다.

Language: kor

URI: https://hdl.handle.net/10371/197342

https://dcollection.snu.ac.kr/common/orgView/000000178340

Files in This Item:

000000178340.pdf 1.11 MB

Appears in Collections:

College of Natural Sciences (자연과학대학)
- Dept. of Statistics (통계학과)
  - Theses (Master's Degree_통계학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share