Improving the Generalization Accuracy of ANN Modeling Using Factor Analysis and Cluster Analysis: Its Application to Streamflow and Water Quality Predictions

김성은

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Improving the Generalization Accuracy of ANN Modeling Using Factor Analysis and Cluster Analysis: Its Application to Streamflow and Water Quality Predictions : 요인분석과 군집분석을 이용한 인공신경망 모델링 일반화 정확도 향상에 관한 연구: 유량 및 수질예측에의 적용

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 김성은

Advisor: 서일원

Major: 공과대학 건설환경공학부

Issue Date: 2014-08

Publisher: 서울대학교 대학원

Keywords: artificial neural network model ; generalization ; exploratory factor analysis ; clustering method ; ensemble of models ; streamflow prediction ; water quality prediction ; 인공신경망 모델 ; 일반화 ; 탐색적 요인분석 ; 클러스터링 방법 ; 앙상블 모델링 ; 유량예측 ; 수질예측

Description: 학위논문 (박사)-- 서울대학교 대학원 : 건설환경공학부, 2014. 8. 서일원.

Abstract: (3) aggregation (ensemble) of the ANN models for estimating the generalized performance of ANN models. To estimate the performance of ANN models with the proposed generalization approaches, ANN models for predicting the streamflow and water quality in Nakdong River were developed and compared with the several models with or without the proposed generalization approaches.
The applications of the ANN models with the generalization approaches proposed in this study have shown that the ANN models with input variable selection by EFA and data division by clustering method give more accurate prediction of the streamflow and water quality. Consequently the generalization accuracy of ANNs modeling was improved. It is expected to serve as valuable methods for the development of ANN models in hydrology and water resources. However, the data division by clustering has its limitation of application in the high-dimensional data set. Future applications of the proposed ANN ensemble model will include the data preprocessing methods which are more interpretable and powerful to high-dimensional data, in order to construct the balanced training data set.
Recently interest in water resources management has been increasing substantially, due to the increased environmental concerns and the availability of innovative computational intelligence approaches. Artificial Neural Networks (ANNs) have become a fairly new tool as the efficient model for prediction and forecasting in a number of areas. Since the 1990s, ANNs have been used increasingly for prediction and forecasting in water resources and environmental engineering. Over the last 20 years or so, despite a significant amount of research activities on the application of ANNs to prediction and forecasting of water resources variables in river systems, little of these are focused on the methodological issues. In many applications, the model building process is described poorly, making it difficult to assess the optimality of the results obtained. Consequently, it becomes necessary to shift the focus of ANNs research from the application of ANNs to various water resources case studies to the methodological issues of ANNs. The issues that have been assessed traditionally are the proper selection and preprocessing of the inputs and outputs, and the choice of the architecture of the neural networks. All these issues are parts of the difficult-to-resolve problems of generalization.
In this study, the primary focus is on improving to ANNs modeling performance through the generalization approaches for reducing bias and variance errors so as to promote them for applications of ANNs to hydrology and water resources. To achieve this, some generalization approaches were explored in ANNs development process as follows: (1) determining input variables through exploratory factor analysis (EFA)
(2) dividing available data set into balanced training, validation and test data set through several clustering methods
최근 환경에 대한 관심으로 수자원 관리에 대한 관심의 증가와 함께, 인공지능 등 기계학습 분야의 혁신적인 발전으로 인공신경망 (Artificial Neural Networks) 모델은 수자원 및 환경 분야뿐만 아니라, 다양한 분야에서 주목을 받는 새로운 모델이 되었다. 최근 15년 동안 혹은 그 이상, 인공신경망 모델은 다양한 분야에서 비선형적 관계의 데이터 해석에 매우 좋은 결과를 보였으며, 수자원 및 환경분야에서도 인공신경망 모델을 이용하여 유량 및 수질을 예측하고 평가하는 많은 연구들이 이루어져 왔다. 하지만, 기존의 수자원 분야에서 개발 및 적용된 많은 인공신경망 모델들은 모델 구성과정과 결과에 대한 설명이 불확실하였고, 결과적으로 인공신경망 모델은 수자원 및 환경 분야에서 널리 사용되어지고 있는 다른 모델들과 같이 보편적인 모델로써 평가 받지 못하게 되었다. 이는 인공신경망 모델링의 방법론적인 측면에 대한 연구가 거의 이루어지지 않았기 때문이며, 현재도 그러한 실정이다. 그러므로, 인공신경망 모델을 통한 분석에서 방법론적인 측면의 연구가 매우 중요하게 떠오르고 있다. 일반적으로 고려되어지는 인공신경망 모델링의 방법론적인 측면은 입력과 목표인자의 선정, 데이터의 선처리, 인공신경망의 최적 구조 선정 등이 있으며, 이들은 모두 해결하기 힘든 모델링의 일반화 (generalization)에 대한 문제에 속한다.
본 연구의 주목적은 수자원 및 환경분야에서 인공신경망 모델의 적용에 있어서, 방법론적인 측면에서 모델링 에러를 줄일 수 있는 방안을 제시하여 인공신경망 모델링의 일반화를 향상시키는데 있다. 이를 위하여 본 연구에서는 인공신경망 모델링 구성과정 에서 모델링의 일반화를 향상 시킬 수 있는 세가지 방안을 제시하였다. 첫 번째 방안으로는 탐색적 요인분석 (EFA)을 통한 적절한 입력인자의 선정방안이며, 두번째 방안으로는 다양한 클러스터링 방법을 사용하여 선택된 입력인자의 데이터들에 내재된 모든 종류의 패턴이 반영될 수 있도록 균형된 학습자료와 검사자료를 구성하는 방안이다. 세 번째는 인공신경망 모델의 학습과정에서 앙상블 (ensemble) 모델링의 적용을 통해 모델의 일반화를 향상시키는 방안이다. 제시된 방안의 적용결과를 평가하기 위하여 낙동강 본류 중류 지점에 대하여 유량 및 수질 예측 인공신경망 모델을 구성하고 일반화 향상 방안을 적용한 모델과 적용하지 않은 모델을 각각 비교하였다.
본 연구에서 제시한 일반화 향상 방안을 적용한 인공신경망 모델 결과가 기존의 방식에 의한 인공신경망 모델 결과에 비해 더욱 정확하게 예측하는 것으로 나타났으며, 전반적인 모델링 에러도 감소하는 것으로 나타났다. 본 연구에서 제시한 일반화 방안이 수문학 및 수자원 분야에서의 인공신경망 모델 개발에 유용한 방법으로 적용되어질 수 있을 것으로 기대된다. 하지만, 클러스터링을 통해 학습자료를 구성하는데 있어서, 본 연구에서 적용한 클러스터링 방법은 변수가 많은 자료의 경우에 그 적용성의 한계를 가지고 있다. 향후, 인공신경망 모델의 적용에서는 변수가 많은 자료에 대해서도 클러스터링 방법을 통한 균형된 학습자료를 구성할 수 있도록, 다변량 자료에 효과적인 클러스터링의 적용을 포함하도록 하겠다.

Language: English

URI: https://hdl.handle.net/10371/118699

Files in This Item:

000000020881.pdf 4.57 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Civil & Environmental Engineering (건설환경공학부)
  - Theses (Ph.D. / Sc.D._건설환경공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share