Development of Survival Prediction Model for the Korean Disease-free Lung Cancer Survivors using Patient Reported Outcome variables: application to Cox proportional hazard regression model and diverse machine learning algorithms
환자 보고 성과 지표를 활용한 한국인 폐암 무병 생존자 생존 예측 모형 개발
- Jin-ah Sim
- 의과대학 의과학과
- Issue Date
- 서울대학교 대학원
- 학위논문 (박사)-- 서울대학교 대학원 : 의과대학 의과학과, 2018. 2. 윤영호.
- Introduction: The prediction of lung cancer survival is a crucial factor for successful cancer survivorship and follow-up planning. The principal objective of this study is to construct a novel Korean prognostic model of 5-year survival within lung cancer disease-free survivors using socio-clinical and HRQOL variables and to compare its predictive performance with the prediction model based on the traditional known clinical variables. Diverse techniques such as Cox proportional hazard model and machine learning technologies (MLT) were applied to the modeling process.
Methods: Data of 809 survivors, who underwent lung cancer surgery between 1994 and 2002 at two Korean tertiary teaching hospitals, were used. The following variables were selected as independent variables for the prognostic model by using literature reviews and univariate analysis: clinical and socio-demographic variables, including age, sex, stage, metastatic lymph node and income
health related quality of life (HRQOL) factors from the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire Core 30
Quality of Life Questionnaire Lung Cancer Module
Hospital Anxiety and Depression Scale, and Post-traumatic Growth Inventory. Survivors body mass index before a surgery and physical activity were also chosen. The three prediction modeling features sets included 1) only clinical and socio-demographic variables, 2) only HRQOL and lifestyle factors, and 3) variables from feature set 1 and 2 considered altogether. For each feature set, three Cox proportional hazard regression model were constructed and compared among each other by evaluating their performance in terms of discrimination and calibration ability using the C-statistic and Hosmer-Lemeshow chi-square statistics. Further, four machine learning algorithms using decision tree (DT), random forest (RF), bagging, and adaptive boosting (AdaBoost) were applied to three feature sets and compared with the performances of one another. The performance of the derived predictive models based on MLTs were internally validated by K-fold cross-validation.
Results: In the Cox modeling, Model Cox-3 (based on Feature set 3: HRQOL factors added into clinical and socio-demographic variables) showed the highest area under curve (AUC = 0.809) compared with two other Cox regression (Cox-1, 2). When we applied the modeling methods into all other MLT based models, the most effective models were Model DT-3 from DT, Model RF-3 from RF, Model Bag-3 from Bagging, Model AdaBoost-3 from AdaBoost techniques, showing the highest accuracy for each of MLT. Model RF-3, Model Bag-3, Model AdaBoost-3 showed the highest accuracy even after k-fold cross-validation were conducted.
Conclusions: Considering that the HRQOLs were added with clinical and socio-demographic variables, the proposed model proved to be useful based on the Cox model or we can apply MLT algorithms in the prediction of lung cancer survival. Improved accuracy for lung cancer survival prediction model has the potential to help clinicians and survivors make more meaningful decisions about future plans and their support to cancer care.