안저영상 기반 녹내장 진단 및 중증도 등급화를 위한 딥러닝 모델

조현성

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

안저영상 기반 녹내장 진단 및 중증도 등급화를 위한 딥러닝 모델 : Deep learning model for glaucoma diagnosis and its stages classification based on fundus images

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 조현성

Advisor: 김홍기

Major: 의과대학 의학과

Issue Date: 2019-02

Publisher: 서울대학교 대학원

Description: 학위논문 (박사)-- 서울대학교 대학원 : 의과대학 의학과, 2019. 2. 김홍기.

Abstract: Abstract

Introduction: This study is concerned with an ensemble method of convolutional neural networks for automatically screening tests for glaucoma and classifying the severity of glaucoma based on fundus photographs. In order to automate the glaucoma screening and classifying severity stages, we defined and trained 48 convolutional neural network models with different characteristics. Finally, the final readings were obtained through the ensemble method proposed in this study from the models in which the study has been finished and their performance was evaluated.

Methods: In this study, 4,445 fundus photographs from 2,801 patients were collected for the training of the convolutional neural network model. The collected fundus photographs were classified into a normal group (unaffected control class) and a glaucoma group by 4 ophthalmology and glaucoma specialists, and the glaucoma group was further divided into an early-stage glaucoma class and a late-stage glaucoma class by referring to the mean deviation (MD) of visual field test results. At this time, the mean deviation value of -6dB or less was classified as a late-stage glaucoma class. Also, up to one fundus photograph was used per side, left and right, for each patient. Out of the all fundus photographs, 3,460 photographs of 2,204 people were used to train the convolutional neural network model, except for the photographs with poor image quality and the ones without 100% agreement on the grade of glaucoma by 4 specialists. The performance of the model was evaluated using the accuracy, sensitivity, specificity, and area under the receiver operating characteristic (AUROC). At this time, the performance of the proposed ensemble method in this study was compared with InceptionNet-v3 as a baseline model. The performance evaluation results of the two methods were tested using the Shapiro-Wilk normality test and the paired t-test was used to test the statistical significance of the performance differences between the two methods.

Results: The performance of the convolutional neural network ensemble method proposed in this study was evaluated separately, one related to the glaucoma screening test and one with the classification of glaucoma severity. The accuracy of the glaucoma screening test was 96.62% (95% confidence interval [CI], 95.5 ~ 97.8%) in the ensemble method. On the other hand, the reference model using one InceptionNet-v3 model showed 93.9% (95% CI, 92.6 ~ 95.2%). The difference in performances between the reference model and the ensemble method for glaucoma screening test accuracy was tested for statistical significance by paired t-test and the result showed that the difference of accuracy was statistically significant with the p-value of 0.000425. In terms of AUROC, the ensemble method showed 0.994 (95% CI, 0.990 ~ 0.997), and the reference model using one InceptionNet-v3 model showed 0.977 (95% CI, 0.969 ~ 0.986). The difference in performances between the reference model and the ensemble method for AUROC of glaucoma screening test was tested for statistical significance by paired t-test and the result showed that the difference of accuracy was statistically significant with p-value of 0.000966. We confirmed that the ensemble method proposed in this study has higher and more stable accuracy and AUROC for glaucoma screening compared to the reference model. In terms of accuracy of severity classification of glaucoma, the ensemble method showed 87.7% (95% CI, 85.9 ~ 89.7%), and the reference model using one InceptionNet-v3 model showed 82.3% (95% CI, 80.2 ~ 84.1%). The difference in accuracy between the reference model and the ensemble method for glaucoma screening test was tested for statistical significance by paired t-test and the result showed that the result was statistically significant with the p-value of 0.002902. In terms of average AUROC, the ensemble method showed 0.975 (95% CI, 0.967 ~ 0.983), and the reference model using one InceptionNet-v3 model showed 0.938 (95% CI, 0.926 ~ 0.949). The difference in performance between the reference model and the ensemble method for average AUROC of glaucoma screening test was tested for statistical significance by paired t-test and the result showed that the result was statistically significant with p-value of 0.000093. We confirmed that the ensemble method proposed in this study has higher and more stable accuracy and AUROC for glaucoma severity classification compared to the reference model.

Conclusions: The proposed ensemble method in this study using multiple convolutional neural networks, shows superior and more stable performance compared to the conventional methods in glaucoma screening test and automating severity classification based on fundus photographs. The results of this study are a clinical decision support system (CDSS) based on artificial intelligence, which can be used in various fields by installing or connecting with the currently widely used fundus camera. By using the results of this study in the fundus camera and utilizing in health check-up centers or ophthalmology clinics, it can improve the efficiency and accuracy of the reading of the fundus photograph results and focus on the second reading by the specialist with its time efficiency, ultimately obtaining more economical and accurate screening results. In addition, if the medical service utilizing the results of the present study is used more actively, the possibility of early diagnosis of potential glaucoma patients can be increased, the medical treatment expenses of the glaucoma patients can be improved, and the related medical expenses can be reduced.
서론: 본 연구는 안저영상을 바탕으로 녹내장 선별검사와 녹내장 중증도를 자동으로 분류하기 위한 합성곱신경망의 앙상블 방법에 관한 것이다. 녹내장의 선별검사와 중증도 등급화를 자동화하기 위해 서로 다른 특성을 갖는 48개의 합성곱신경망 모델을 정의하고 훈련했다. 학습을 완료한 모든 모델은 본 연구에서 제안하는 앙상블 방법을 통해서 최종 판독 결과를 도출하였고, 그 성능을 평가하였다.

방법: 본 연구에서는 합성곱신경망 모델의 훈련을 위해 2,801명의 환자로부터 측정한 4,445장의 안저영상을 수집하였다. 수집한 안저영상은 4명의 녹내장 전문의가 정상 집단과 녹내장 집단으로 분류하고, 녹내장 집단은 시야검사 결과의 평균 편차 (Mean Deviation, MD)를 참조하여 초기 녹내장 집단과 중증 녹내장 집단으로 세분화하였다. 이때, 평균 편차가 -6dB 이하를 중증 녹내장 집단으로 분류하였다. 또한, 환자 1명으로부터 좌, 우 각각 최대 1장씩의 안저영상을 사용하였다. 전체 안저영상 중에서 영상 품질이 열악한 것과 4명 녹내장 전문의의 등급 판정 결과가 100% 일치하지 않은 영상을 제외한 2,204명의 3,460장을 가지고 합성곱신경망 모델을 훈련하였다.

모델의 성능은 정확도, 민감도, 특이도, AUROC(Area Under the Receiver Operating Characteristic)을 평가 지표로 삼았다. 이때, InceptionNet-v3를 기준 모델로 하고, 본 연구에서 제안한 앙상블 방법과 성능을 비교하였다. 두 방법의 성능평가 결과를 Shapiro-Wilk normality test로 정규성 검정을 하였으며, paired t-test를 사용하여 두 방법의 성능 차이에 대한 통계적 유의성을 검정하였다.

결과: 본 연구에서 제안한 합성곱신경망 앙상블 방법은 녹내장 선별검사에 관한 것과 녹내장 중증도 분류에 관한 것으로 분리하여 성능을 평가하였다. 녹내장 선별검사의 정확도 측면에서 앙상블 방법은 96.6% (95% confidence interval [CI], 95.5 ~ 97.8%)를 보였다. 반면, InceptionNet-v3 모델 한 개를 사용한 기준 모델은 93.9% (95% CI, 92.6 ~ 95.2%)를 보였다. 기준 모델과 앙상블 방법의 녹내장 선별검사 정확도에 대한 성능 차이는 paired t-test를 통해 통계적 유의성을 검정하였고, 그 결과는 p-value 0.000425로 정확도의 차이가 통계적으로 유의함을 밝혔다. AUROC 측면에서 앙상블 방법은 0.994 (95% CI, 0.990 ~ 0.997)를 보였으며, InceptionNet-v3 모델 한 개를 사용한 기준 모델은 0.977 (95% CI, 0.969 ~ 0.986)를 보였다. 녹내장 선별검사에 있어서 기준 모델과 앙상블 방법의 AUROC에 대한 성능 차이는 역시 paired t-test를 통한 통계적 유의성을 검정하였고, 결과는 p-value 0.000966으로 AUROC의 차이가 통계적으로 유의함을 밝혔다. 이로써 녹내장 선별검사에서 앙상블 방법이 정확도와 AUROC 측면에서 더 높고 안정적인 것을 확인하였다. 녹내장 중증도 분류의 정확도 측면에서 앙상블 방법은 87.7% (95% CI, 85.9 ~ 89.7%)를 보였고, InceptionNet-v3 모델 한 개를 사용한 기준 모델은 82.3% (95% CI, 80.2 ~ 84.1%)를 보였다. 녹내장 중증도 분류에 있어서 기준 모델과 앙상블 방법의 정확도 차이는 paired t-test를 통해 통계적 유의성을 검정하였고, 그 결과는 p-value 0.002902로 그 차이가 통계적으로 유의함을 밝혔다. 평균 AUROC 측면에서 앙상블 방법은 0.975 (95% CI, 0.967 ~ 0.983)를 보였으며, InceptionNet-v3 모델 한 개를 사용한 기준 모델은 0.938 (95% CI, 0.926 ~ 0.949)을 보였다. 녹내장 중증도 분류에 있어서 평균 AUROC에 대한 기준 모델과 앙상블 방법의 성능 차이 역시 paired t-test를 통해 통계적 유의성을 검정하였고, 그 결과는 p-value 0.000093으로 그 차이가 통계적으로 유의함을 밝혔다. 이로써 녹내장 중증도 분류에서도 앙상블 방법이 정확도와 AUROC 측면에서 더 높고 안정적인 것을 확인하였다.

결론: 본 연구에서 제안하는 여러 개의 합성곱신경망을 앙상블 하는 방법은 안저영상을 바탕으로 녹내장 선별검사와 중증도 분류를 자동화하는 데 있어서 기존의 방법보다 우수하고 안정적인 성능을 발휘한다. 본 연구결과는 인공지능 기술을 바탕으로 하는 임상 의사 결정 지원 시스템(Clinical Decision Support System, CDSS) 소프트웨어로, 현재 널리 보급된 안저촬영기에 탑재 또는 연동하는 방식으로 다양한 분야에서 활용할 수 있다. 안저촬영기에 본 연구결과를 탑재하여 건강검진센터나 안과 진료현장에서 활용한다면, 안저촬영 결과의 판독 효율과 정확성을 높일 수 있고, 이에 따른 시간적 이득을 전문의의 2차 판독에 할애함으로써 보다 경제적이고 정확한 검진 결과를 얻을 수 있다. 또한, 본 연구결과를 활용한 의료서비스가 활성화된다면, 잠재적 녹내장 환자에 대한 조기진단의 가능성을 높일 수 있고, 이에 따른 녹내장 환자의 치료 효과 향상과 관련 의료비용 지출을 절감할 수 있다.

Language: kor

URI: https://hdl.handle.net/10371/152679

Files in This Item:

000000154883.pdf 14.28 MB

Appears in Collections:

College of Medicine/School of Medicine (의과대학/대학원)
- Dept. of Medicine (의학과)
  - Theses (Ph.D. / Sc.D._의학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share