Efficient Portfolio Management using Deep Reinforcement Learning

Abstract: Given historical stock prices in a portfolio, how can we efficiently allocate weights to maximize cumulative returns? Portfolio management is widely used in financial planning tasks that aim to maximize profits and minimize risks at the same time. Existing methods using deep learning and reinforcement learning algorithms have achieved significant improvement in efficient allocation problems. However, they perform poorly in downward trends of the financial markets because of their ability to deal with sudden downward trends.
In this paper, we propose Portfolio Management with Short Position (PMSP) which employs a reinforcement learning algorithm to search the optimal allocations by adding a short position strategy to make profits even in downward trends. PMSP extracts and refines features from historical prices of stocks in order to reflect market dynamics. It then uses Deep Deterministic Policy Gradient (DDPG) algorithm for faster convergence of parameters by adding the concepts of memory buffers and target networks. Finally, instead of using the softmax function which transforms the sum of the input values 1 so that the function cannot apply a short position strategy, we apply the hyperbolic tangent at the end of the model to allow negative values, which allows the model to make short positions and earn profits even in downward trends.
Experimental results show that PMSP achieves the highest portfolio value, which earns 102% profits in a year, giving state-of-the-art performance.
여러 주식의 가격 정보를 담고 있는 포트폴리오가 주어졌을 때, 어떻게 각 주식에 자산을 효율적으로 배분하여 수익을 최대화할 수 있을까? 포트폴리오 매니지먼트 (Portfolio Management)는 포트폴리오 내의 각 주식에 자산을 배분하여 수익을 최대화하는 동시에 투자 위험을 최소화하는 것을 목표로 한다. 머신 러닝과 딥러닝 기술이 발전함에 따라 해당 기술을 사용하여 효율적으로 자산을 배분하는 선행 연구들이 많이 발표되었다. 그러나 선행 연구들은 주식 시장이 하락장일 때 좋지 못한 성능을 보였다.
따라서 해당 논문에서는 강화학습 기법과 인버스 투자 전략을 추가하여 주식의 하락장에서도 수익을 창출할 수 있는 Portfolio Management with Short Position (PMSP) 알고리즘을 제안한다. PMSP는 시시각각 변하는 주식의 변동성을 반영할 수 있도록 각 주가(시/고/저/종가)를 서로 비교하는 피처를 생성한다. 또한 replay buffers 및 타겟 네트워크를 사용하여 빠르고 안정적인 학습을 가능케하는 Deep Deterministic Policy Gradient (DDPG) 알고리즘을 사용한다. 마지막으로 각 입력값을 양수로 변환하여 인버스 투자(음수값)를 반영하지 못하는 소프트맥스 함수 (Softmax function)를 네트워크 끝 단에 사용하는 대신, 음수의 값을 취할수 있는 hyperbolic tangent 함수를 사용하여 인버스 투자를 가능케 하였다. 따라서 PMSP는 인버스 투자를 통해 주식의 하락장에서도 수익을 얻을 수 있는 장점을 지니고 있다.
PMSP의 성능을 확인하기 위한 여러가지 실험을 진행하였으며 이를 통해 PMSP가 선행 연구 기술보다 높은 수익률(연 102%)을 달성한 것을 확인할 수 있었다.

Language: eng

URI: https://hdl.handle.net/10371/175393

https://dcollection.snu.ac.kr/common/orgView/000000163916

Files in This Item:

000000163916.pdf 4.40 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share