Graph Convolutional Networks for Predictive Healthcare using Clinical Notes

DC Field Value Language
dc.contributor.authorPIAO YINHUA-
dc.description학위논문 (석사) -- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2020. 8. 김선.-
dc.description.abstractClinical notes in Electronic Health Record(EHR) system are recorded in free text forms with different styles and abbreviations of personal preference. Thus, it is very difficult to extract clinically meaningful information from EHR clinical notes. There are many computational methods developed for tasks such as medical text normalization, medical entity extraction, and patient-level prediction tasks. Existing methods for the patient-level prediction task focus on capturing the contextual or sequential information from clinical texts, but they are not designed to capture global and non-consecutive information in the clinical texts. Recently, graph convolutional neural networks(GCNs) are successfully used for text-based classification since GCN can extract the global and long-distance information among the whole texts. However, application of GCN for mining clinical notes is yet to be fully explored.

In this study, we propose an end-to-end framework for the analysis of clinical notes using graph neural network-based techniques to predict whether a patient is with MRSA (Methicillin-Resistant Staphylococcus Aureus) positive infection or negative infection. For this MRSA infection prediction, it is critical to capture the patient-specific and global non-consecutive information from patient clinical notes. The clinical notes of a patient are processed to construct a patient-level graph, and each patient-level graph is fed into the GCN-based framework for graph-level supervised learning.

The proposed framework consists of a graph convolutional network layer, a graph pooling layer, and a readout layer, followed by a fully connected layer. We tested various settings of the GCN-based framework with various combinations of graph convolution operations and graph pooling methods and we evaluated
the performance of each variant framework. In experiments with MRSA infection data, all of the variant frameworks with graph structure information outperformed several baseline methods without using graph structure information with a margin of 2.93%∼11.81%. We also investigated graphs in the pooling step to conduct interpretable analysis in population-based statistical and patient-specific aspects, respectively. With this inspection, we found long-distance word pairs that are distinct for MRSA positive patients and we also showed the pooled graph of the patient that contributes to the patient-specific prediction. Moreover, the Adaboost algorithm was used to improve the performance further. As a result, the framework proposed in this paper reached the highest performance of 85.70%, which is higher than the baseline methods with a margin of 3.71%∼12.59%.
dc.description.abstract전자 건강 기록은 디지털 형태로 체계적으로 수집된 환자의 건강 정보다. 전자 건강 기록이 환자의 상태를 표현 하는 단어들로 구성된 문서의 집합이기때문에 자연어 처리 분야에 적용되는 다양한 기계학습적 방법들이 적용되어왔다. 특히, 딥러닝 기술의 발전으로 인해, 이미지나 텍스트 분야에서 활용 되던 딥러닝 기술 들이생명정보및의학정보분야에점차적용되고있다.하지만,기존의이미지나 텍스트데이터와는 다르게, 전자 건강 기록 데이터는 작성자 및 환자 개개인의 상태에 따라서, 데이터의 환자 특이성이 높다. 또한, 유사한 의미를 지니는 건강 기록들간의 상관관계를 고려해야 할 필요가있다. 본연구에서는 전자 건강 기록 데이터의 환자특이성을 고려한 그래프 기반 딥러닝 모델을 고안하였다. 환자의 전자 건강 기록 데이터와 의료 문서들의 공통 출현 빈도를 활용 하여 환자 특이적 그래프를 생성하였다. 이를 기반으로, 그래프 컨볼루션 네트워크를 사용하여 환자의 병리학적상태를예측하는모델을고안하였다. 연구에서 사용한 데이터는 Methicillin-Resistant Staphylococcus Aureus(MRSA) 감염 여부를 측정한 데이터이다. 고안한 그래프기반 딥러닝 모델을 통해 환자의 내성을 예측한 결과, 그래프정보를 활용 하지 않은 기존모델들 보다 2.93%∼11.81% 뛰어난성능을보였다.
또한 해석 가능한 분석을 수행하기 위해 풀링 단계에서 그래프를 조사했다.이를 통해 MRSA 양성 환자에 대해 구별되는 장거리 단어패턴을 찾았으며 환자별 예측에 기여하는 환자의 합동 그래프를 보여 주었다. 성능을 더욱 향상시키기 위해 아다부스트 알고리즘을 사용하였다. 본 논문에서 제안된 결과는 85.70%로 가장 높은 성능을 기록했으며, 이는 기존 모델보다 3.71%∼12.59%의 향상 시켰음을 보여주었다.
dc.description.tableofcontentsChapter 1 Introduction 1
1.1 Background 1
1.1.1 EHR Clinical Text Data 1
1.1.2 Current methods and limitations 3
1.2 Problem Statement and Contributions 4
Chapter 2 Related Works 6
2.1 Traditional Methods 6
2.2 Deep Learning Methods 7
2.3 Graph Neural Networks 8
2.3.1 Graph Convolutional Networks 8
2.3.2 Graph Pooling Methods 9
2.3.3 Applications of GNN 10
Chapter 3 Methods and Materials 12
3.1 Notation and Problem Definition 12
3.2 Patient Graph Construction Process 14
3.2.1 Parsing and Filtering 15
3.2.2 Word Co-occurrence Finding 16
3.2.3 Patient-level Graph Representation 16
3.3 Word Embedding 17
3.4 Model Architecture 18
3.4.1 Graph Convolutional Network layer 19
3.4.2 Graph Pooling layer 22
3.4.3 Readout Layer 24
3.5 Prediction and Loss Function 25
3.6 Adaboost algorithm 25
Chapter 4 Experiments 27
4.1 EHR Dataset 27
4.1.1 Introduction to MIMIC-III Dataset 27
4.1.2 MRSA Data Collection 28
4.2 Hyper Parameter Settings 28
4.2.1 Model Training 29
4.3 Baseline Models 30
Chapter 5 Results 32
5.1 Performance Comparisons with baseline models 32
5.2 Performance Comparisons with graph networks 33
5.3 Interpretable analysis 34
5.4 Adaboost Result 38
Chapter 6 Conclusion 40
국문초록 49
감사의 글 50
dc.publisher서울대학교 대학원-
dc.subjectClinical notes-
dc.subjectGraph neural network-
dc.subjectGraph pooling-
dc.subjectInterpretable analysis-
dc.titleGraph Convolutional Networks for Predictive Healthcare using Clinical Notes-
dc.contributor.department공과대학 컴퓨터공학부-
Appears in Collections:
College of Engineering/Engineering Practice School (공과대학/대학원)Dept. of Computer Science and Engineering (컴퓨터공학부)Theses (Master's Degree_컴퓨터공학부)
Files in This Item:
  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.