Prediction Model of Soil Drought Distribution Using Machine Learning Algorithms and Geospatial Data

박해경

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Prediction Model of Soil Drought Distribution Using Machine Learning Algorithms and Geospatial Data : 머신러닝/공간자료를 이용한 가뭄예측모델 개발 및 시뮬레이션 기반의 물관리정책 효과성 추정

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 박해경

Advisor: 이동근

Issue Date: 2020

Publisher: 서울대학교 대학원

Description: 학위논문(박사)--서울대학교 대학원 :환경대학원 협동과정 조경학,2020. 2. 이동근.

Abstract: 본 연구에서는 과학적 방법을 기반으로 하는 물관리 정책 수립의 전 과정을 제시하기 위하여 가뭄 예측을 위한 Severe Drought Area Prediction(SDAP) 모델의 개발, 모델에 사용된 알고리즘의 검증, 그리고 모델을 응용한 가격 기반의 물수요 조절정책 시뮬레이션 순으로 내용을 구성하였다. SDAP 모델의 핵심 기술은 머신러닝과 공간정보기술(원격탐사와 GIS)의 융합으로, 이로 인하여 테이블 자료 대신 공간 자료를 이용할 수 있게 되어 예측 결과를 지도로 시각화 할 수 있었다.
SDAP 모델은 단기가뭄 예측의 중요성과 어려움 그리고 신속한 물공급을 위한 정책적 차원의 우선순위 할당을 고려하여 개발되었다. 모델의 개발배경은 급속한 과학기술의 발전에도 불구하고 지구 온난화에 의한 이상기후의 증가로 인하여 오히려 강수(precipitation)의 확률적 예측은 점차 어려워져 이에 기반한 가뭄예측 및 대응정책은 리스크가 따른다는 문제로부터 출발한다. 실제로 2015~17년 경기도에서는 미처 예기치 못한 심각한 가뭄으로 큰 피해를 입었다. 기후변화에 관한 정부간 협의체 특별보고서인 「지구 온난화 1.5℃」에 따르면, 온난화가 심화될수록 강수 변동폭도 커지게 되어 가뭄은 분명 증가 추세로 진행될 것이므로 가뭄에 대비한 물관리 정책은 매우 중요하다. 특히 농업지역은 가뭄 시 직격탄을 맞을 뿐만 아니라 이 피해는 결국 전 분야에 영향을 준다. 토양가뭄(soil drought)으로 인한 농작물의 상해는 회생 불가능하며 이는 곧 물가상승으로 이어지기 때문이다. 미국 국립 가뭄완화센터에서는 이와 같은 가뭄 피해의 최소화를 위하여 가뭄 발생 전 물공급의 우선 순위를 정하여 가뭄 발생시 즉각적으로 실행할 것을 권고하고 있다. 그러나 수개월 동안 진행되는 단기가뭄은 앞서 언급한 예측의 불확실성 문제를 논외로 하더라도 그 기간이 기상(weather)과 기후(climate)의 경계에 있어 정확한 확률적 예측이 거의 불가능하다.
SDAP 모델은 수개월간 비가 오지 않는 단기가뭄 시 나타날 토양가뭄의 공간분포에 대한 사전 추정 모델로, 가뭄의 유무(yes/no) 및 강수의 확률 예측 모델이 아니다. 그렇기 때문에 강수량 자료를 사용하지 않고 대신 실제 가뭄이 발생했던 시기의 위성영상과 지형 데이터를 이용하여 과거의 가뭄을 머신러닝으로 학습시켜 장래의 토양가뭄 분포를 예측한다는 특징이 있다. SDAP 모델을 이용한 예측결과는 토양가뭄의 상대적 심각도를 지도로 보여주므로 물공급 우선지역 선정 시 유용할 뿐만 아니라, 저수지나 지하수 같은 수자원 정보, 혹은 토지이용도와 중첩(overlay)이 가능하므로 지역상황을 고려한 우선순위 재조정에도 용이하다.
본 논문의 연구지역은 전형적인 가뭄지역이 아니지만 기후 변화로 인해 큰 가뭄을 경험했던 경기도 남부지역이며 연구에 사용된 프로그래밍 언어는 파이썬(Python 3.6), 정책 시뮬레이션은 벤심(Vensim PLE)을 이용하였다. 논문의 각 장은 소제목을 가진 개별 연구들로 구성되어 있으며 순서와 내용은 다음과 같다.
1장의 소제목은 "랜덤 포레스트(random forest)알고리즘 기반의 가뭄심각지역의 예측"으로, 이 장에서는 SDAP 모델 설계의 상세, 가뭄 훈련지역 선정 시 고려사항, 모델의 적용범위와 장점 및 한계점을 다루었다. 토양가뭄의 분포는 토양수분지수 (Soil Moisture Index, SMI)를 이용해 0과 1 사이의 실수로 나타내었다. 모델 개발의 아이디어는 머신러닝을 이용하면 가뭄 기간 동안 일어나는 토양수분과 지표환경(식생, 지형, 물, 온도)사이의 메커니즘에 대한 학습이 가능할 것이라는 발상으로부터 시작되었다. 지표환경에 해당하는 15개의 입력 변수들은 위성영상(Landsat-8)과 수치표고모형(digital elevation model)을 이용하여 생성되었다. 학습방법은 훈련지역으로부터 추출된 입력변수들과 약 3개월간의 무강우 이후 SMI를 출력변수로 하는 감독 훈련(supervised learning)에 해당한다. 결과에 의하면, 학습된 가뭄은 연구지역의 현재 위성 영상과 수치표고모형을 이용하여 약 3개월간의 무강우 후의 SMI 분포를 R2 = 0.58로 예측해 내었다. 이것은 훈련 성능(R2 = 0.91) 대비 다소 하향되어 보이나 실제 3개월간 강우가 없었던 연구지역의 SMI분포와 비교하면 공간패턴이 거의 유사하였다. 따라서 SDAP모델을 이용하면 비가 오지 않는 경우 토양 가뭄이 상대적으로 더 심각한 지역을 수개월 앞서 예측할 수 있었다.
2장의 소제목은 "트리 vs. 네트워크: 원격탐사 자료로 회귀 예측 시 어떤 알고리즘이 더 나은 선택인가?"로, SDAP모델에서 사용된 알고리즘의 검증을 위해 인공 신경망 알고리즘 중 회귀 예측으로 잘 알려진 다중 퍼셉트론(multi-layer perceptron)과 비교하였다. 이 장에서는 알고리즘의 단순 성능비교에서 더 나아가 대부분의 분야에서 인공 신경망이 주류인 것과 달리 원격탐사 분야에서는 랜덤 포레스트도 꾸준히 활용되는 이유를 데이터 특성에서 찾고자 시도하였다. 이를 위하여15개의 입력 변수들의 데이터 숫자 타입에 따라 그룹으로 나누어 학습성능을 비교하였다. 그 결과, 신경망 기반의 다중 퍼셉트론은 데이터 범위에 민감하여 위성영상을 이용한 지수(-1 to 1)나 반사도(1 to 10,000)처럼 범위가 너무 작거나 큰 데이터 그룹에서는 학습성능이 저하되었다. 반면 랜덤 포레스트는 결정 트리(decision tree)기반으로 작동하기 때문에 데이터 범위나 단위에 관계없이 학습이 가능하였다. 따라서 위성영상을 데이터로 하는 SDAP 모델 에서는 랜덤 포레스트가 다중 퍼셉트론보다 더 적합함을 확인하였다.
3장의 소제목은 " 한국에서 가격 기반의 상수도 수요 조절 정책을 시행하는 것이 가뭄 완화를 위해 적절한 방법인가?" 로, 이 장에서는 SDAP모델을 시스템 다이내믹스(system dynamics) 모델과 연계함으로써 개발된 모델을 응용해 물관리 정책을 수립하는 과정을 구체적으로 다루었다. 두 모델은 각각 독립적으로 구동되어 SDAP모델이 토양가뭄 심각지역을 예측하고 나면, 시스템 다이내믹스 모델이 이 지역을 대상으로 수도요금 인상 시 상수도 사용 절감 효과와 확보되는 용수량을 시뮬레이션 하게 된다. 시스템 다이내믹스는 구조 방정식과 유사하나 시간과 반복(loop)의 개념을 포함하고 있어 동적 인간행동의 계산이 가능해 정책시행 효과의 추정이 가능하다는 특징이 있다. 이 장에서는 3단계의 가상 가격정책 시나리오를 모의하였고 데이터로 개인 1일 물 사용량, 인구수, 취수원 위치정보, 상수도 가격, 생활용수의 가격탄력성 지수가 추가로 사용되었다. 결과에 의하면 정책은 시행 3개월 이후부터 가시적 효과가 나타나기 시작해 정작 용수의 공급이 필요한 가뭄기간 동안에는 수도 절감효과가 미미하여 정책의 실효성은 크지 않은 것으로 나타났다. 또한 이와 같이 가격정책이 가뭄완화에 효과적이지 않은 경우에는 수도세 고지 빈도를 높이는 등 비가격 정책의 보완이 필요하며 시스템 다이내믹스 모델을 비가격 요인까지 확장시키면 정책의 수정/보완/이행에 효과적일 것이라고 제안하였다.
본 연구의 결론에서는 연구의 목적에 따라 적합한 분석 방법이 사용되는 것이 옳다고 강조한다. 전통적인 통계방법인 선형 회귀는 관계 해석에 용이하고 머신러닝은 예측이 뛰어나다. 따라서 1장에서 제안된 SDAP모델은 가뭄에 대한 원인분석보다 예측에 기반한 정책수립이 목적이므로 머신러닝의 블랙박스 문제는 중요사항이 아니다. 또한 시스템 다이내믹스는 시행효과의 확인에 긴 시간이 소요되는 정책의 특성이 고려된 시간효율적 방법이다. 이 방법론 역시 검증에 대한 논란이 있지만 사실상 현실세계를 완벽히 검증할 수 있는 수단은 어디에도 부재하다. 그러므로 분석결과의 수치는 배제하더라도 결과의 대략적 추세는 충분히 참고 할만한 가치가 있다. 2장의 연구 결과는 단순히 모델 알고리즘의 검증뿐만 아니라 위성영상과 머신러닝을 이용해 회귀 예측 하는 경우 랜덤 포레스트가 우선 고려 될 수 있다는 시각을 제시하였다는 점에서 큰 의의가 있다. 3 장에서 다룬 모델의 응용은 그간의 분석적 연구들이 정책적 활용 가능성에 대해 단순 언급에 그친 반면, 본 연구에서는 모델의 활용을 통한 물관리 정책 수립까지의 전 과정을 보여줌으로써 과학적 방법에 기반한 정책수립 과정을 구체화 하여 제시하였다는 점에서 의미를 가진다.
This study presents the entire process of establishing a water management policy based on scientific methods through drought prediction. Accordingly, this thesis includes the development of the severe drought area prediction (SDAP) model, verification of the algorithms used in the proposed model, and an application of the proposed model to policymaking. The core technology of the SDAP model is the convergence of machine learning and geospatial science, which makes it possible to use geospatial data instead of tabular data to visualize prediction results.
The SDAP model was developed by considering the importance and difficulty of forecasting short-term droughts and the allocation of priorities for rapid water supply in terms of the related policy. The background of the models development began from the fact that, despite the advancements of science and technology, it has become more difficult to predict the probability of precipitation and prepare for droughts based on this probability due to the increasingly abnormal climate that has been associated with global warming. In fact, during 2015–17, Korea suffered from unpredictable, severe spring droughts. Such droughts are predicted to increase due to the fluctuation of precipitation as warming increases according to Global Warming 1.5 °C, a special report by the Intergovernmental Panel on Climate Change (IPCC). Thus, water management policy has become increasingly important over recent decades. In particular, rural areas are directly adversely affected during drought periods; hence, if soil droughts are not quickly resolved, crop damage could affect economic inflation and human life. The US National Drought Mitigation Center has recommended that water supply priorities be set before the occurrence of droughts and that they be implemented immediately when a drought commences in order to minimize damage. However, accurate drought predictions that are based on probabilistic precipitation are difficult because short-term droughts (i.e., lasting from several weeks to months) are at the boundary between weather and climate.
The SDAP model can estimate the spatial distribution of a soil drought in advance by assuming the subsequent lack of rainfall over the short-term as opposed to the yes/no prediction of a drought. The characteristics of the SDAP model enable it to predict future droughts by training the actual past droughts by machine learning using satellite imagery and topographic data without precipitation data. Prediction results by the SDAP model therefore assist in the selection of water supply priority areas through the provision of visualized maps of the relative severity of a soil drought. In addition, overlaying these maps with water resources (reservoirs and groundwater) or land use maps can also help to rearrange priorities in consideration of local conditions.
The study area in this research is the Gyeonggi Province, a southern metropolitan area in Korea, which has experienced droughts that are understood to be related to climate change. Python was used as the programming language to develop the SDAP model. Each chapter of this thesis consists of stand-alone papers with subtitles as follows.
Chapter 1: Prediction of Severe Drought Area Based on Random Forest: Using Satellite Image and Topography Data. This chapter covers the details of the SDAP model design, the consideration of training areas, and the models coverage, advantages, and limitations. The distribution of a soil drought is expressed as the soil moisture index (SMI) with a float number type between 0 and 1. The model development began with the idea that machine learning might allow training of the mechanisms between soil moisture and surface environments (e.g., vegetation, topography, water, and temperature) during a drought. Fifteen input variables corresponding to the surface environment were generated using Landsat-8 imagery and a digital elevation model (DEM). The training method belongs to supervised learning because it uses the SMI after a period of 3 months without rainfall as output variables. As a result, the trained drought (R2 = 0.91) predicted the SMI distribution after 3 months of no rain with a performance of R2 = 0.58 using current landsat8 images and a DEM of the study area. The predicted soil droughts were somewhat lower than the training performance, but the spatial patterns were similar to the actual SMI after the actual droughts. Thus, the SDAP model could predict the areas potentially more severely affected when there was a drought.
Chapter 2: Tree vs. Network: Which Is Better Machine Learning Algorithm for Regression Prediction When Using Remote Sensing Data? In this chapter, in order to verify the random forest algorithm in the SDAP model, the method is compared with the multi-layer perceptron method, which is well-known as a non-linear regression method for prediction among artificial neural network algorithms. Furthermore, an attempt is made to decipher the reason for the random forest mainstream in the remote sensing field, which is unlike other fields. For this reason, 15 training variables were divided into groups according to the data type, and the training performance was compared. As a result, the analysis showed a lesser performance when using data groups with either a too small (-1 to 1) or large (0 to 10,000) range (e.g., map indexes or reflectance values from satellite images) because the multi-layer perceptron based on neural networks was sensitive to data ranges. In contrast, the random forest algorithm performed better because it worked based on a decision tree that was independent of data range or units. Therefore, the random forest algorithm was verified to be more suitable in the SDAP model that uses satellite image and machine learning.
Chapter 3: Is Water Pricing Policy Adequate to Reduce Water Demand for Drought Mitigation in Korea?. In this chapter, the process of planning policy by applying the SDAP model is suggested in detail. Two models were independent: the SDAP model predicted the drought severity and a system dynamics model simulated water-use savings by assuming a water price increase that targeted the severe drought area. Three policy scenarios were simulated using data that included individual daily water usage, population, the water source location, tap water prices, and the residential water price elasticity index. The results showed that the visible effect was after 3 months of implementation, which means that there were few water savings during droughts that required a water supply, and that policy implementation was not effective. If the price base policy is not effective as in Korea, it suggests that the water control policy needs to be supplemented by extending the system dynamics model to include the non-price factor.
In conclusion, appropriate analytical methods should be used for the given purpose of a study. Traditional statistics such as linear regression are useful for understanding causality, and machine learning is powerful for predictive purposes. Therefore, the black box problem of machine learning is not an important consideration because the SDAP model aims to establish a policy based on prediction rather than the cause and understanding of droughts. Also, system dynamics is a time-efficient method when considering characteristics that take a long time to confirm the effect of policy. Although there is a dispute over the validation of the simulation results by system dynamics, no means exist to completely verify the real world. Thus, the trend is worth mentioning even though the resultant value is excluded. The results in Chapter 2 are meaningful in terms of the fact that the random forest method can be first considered using satellite imagery and machine learning for regression prediction. In Chapter 3, the application of the model is meaningful because it shows the entire process from the use of the model to the establishment of the water management policy. While existing models or analyses have stated only simple policy applicability, this thesis is significant because it shows the whole policy-making process based on scientific methods.

Language: eng

URI: https://hdl.handle.net/10371/168099

http://dcollection.snu.ac.kr/common/orgView/000000158968

Files in This Item:

000000158968.pdf 4.01 MB

Appears in Collections:

Graduate School of Environmental Studies (환경대학원)
- Program in Landscape Architecture (협동과정-조경학전공)
  - Theses (Ph.D. / Sc.D._협동과정-조경학전공)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share