Publications

Detailed Information

Development of Survival Prediction Model for the Korean Disease-free Lung Cancer Survivors using Patient Reported Outcome variables: application to Cox proportional hazard regression model and diverse machine learning algorithms : 환자 보고 성과 지표를 활용한 한국인 폐암 무병 생존자 생존 예측 모형 개발

DC Field Value Language
dc.contributor.advisor윤영호-
dc.contributor.authorJin-ah Sim-
dc.date.accessioned2018-05-28T16:56:01Z-
dc.date.available2018-05-28T16:56:01Z-
dc.date.issued2018-02-
dc.identifier.other000000151084-
dc.identifier.urihttps://hdl.handle.net/10371/140990-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 의과대학 의과학과, 2018. 2. 윤영호.-
dc.description.abstractIntroduction: The prediction of lung cancer survival is a crucial factor for successful cancer survivorship and follow-up planning. The principal objective of this study is to construct a novel Korean prognostic model of 5-year survival within lung cancer disease-free survivors using socio-clinical and HRQOL variables and to compare its predictive performance with the prediction model based on the traditional known clinical variables. Diverse techniques such as Cox proportional hazard model and machine learning technologies (MLT) were applied to the modeling process.
Methods: Data of 809 survivors, who underwent lung cancer surgery between 1994 and 2002 at two Korean tertiary teaching hospitals, were used. The following variables were selected as independent variables for the prognostic model by using literature reviews and univariate analysis: clinical and socio-demographic variables, including age, sex, stage, metastatic lymph node and income
-
dc.description.abstracthealth related quality of life (HRQOL) factors from the European Organization for Research and Treatment of Cancer Quality of Life Questionnaire Core 30-
dc.description.abstractQuality of Life Questionnaire Lung Cancer Module-
dc.description.abstractHospital Anxiety and Depression Scale, and Post-traumatic Growth Inventory. Survivors body mass index before a surgery and physical activity were also chosen. The three prediction modeling features sets included 1) only clinical and socio-demographic variables, 2) only HRQOL and lifestyle factors, and 3) variables from feature set 1 and 2 considered altogether. For each feature set, three Cox proportional hazard regression model were constructed and compared among each other by evaluating their performance in terms of discrimination and calibration ability using the C-statistic and Hosmer-Lemeshow chi-square statistics. Further, four machine learning algorithms using decision tree (DT), random forest (RF), bagging, and adaptive boosting (AdaBoost) were applied to three feature sets and compared with the performances of one another. The performance of the derived predictive models based on MLTs were internally validated by K-fold cross-validation.
Results: In the Cox modeling, Model Cox-3 (based on Feature set 3: HRQOL factors added into clinical and socio-demographic variables) showed the highest area under curve (AUC = 0.809) compared with two other Cox regression (Cox-1, 2). When we applied the modeling methods into all other MLT based models, the most effective models were Model DT-3 from DT, Model RF-3 from RF, Model Bag-3 from Bagging, Model AdaBoost-3 from AdaBoost techniques, showing the highest accuracy for each of MLT. Model RF-3, Model Bag-3, Model AdaBoost-3 showed the highest accuracy even after k-fold cross-validation were conducted.
Conclusions: Considering that the HRQOLs were added with clinical and socio-demographic variables, the proposed model proved to be useful based on the Cox model or we can apply MLT algorithms in the prediction of lung cancer survival. Improved accuracy for lung cancer survival prediction model has the potential to help clinicians and survivors make more meaningful decisions about future plans and their support to cancer care.
-
dc.description.tableofcontentsI. INTRODUCTION 1
A. Background 1
1. Lung cancer statistics 1
2. The importance of suggesting survival prediction model to cancer survivors 4
3. HRQOL and lifestyle measurement as important predictors for lung cancer survival 5
4. Traditional survival analysis versus machine learning techniques (MLTs) 7
B. Hypothesis and objectives 10
1. Hypothesis 10
2. Objectives 10
II. MATERIALS AND METHODS 12
A. Study subjects 12
1. Subject selection 12
2. Data collection 13
2.1. Socio-demographic and clinical variables 15
2.2. Patient lifestyle characteristics 17
3. Study process 20
B. Prognostic variables selection and data preprocessing 22
1. Prognostic variables selection 22
1.1. Literature review for the selection of candidate predictors 22
1.2. Grading the evidence and mapping into the conceptual framework 25
1.3. Examination of prognosis variables selection from statistical analyses 28
2. Data preprocessing 29
2.1. Data cleaning, missing imputation 29
2.2. Test of multi-collinearity 29
2.3. Decisions of cut-off points 30
2.4. Data sampling for data balancing, SMOTE 31
2.5. Data splitting (holdout strategy) 32
C. Model development 33
1. Cox model development 34
3. Random forest model 38
4. Bagging (bootstrap aggregating) 40
5. Adaptive boosting (AdaBoost) 42
D. Model validation 44
1. Model validation for Cox model 44
1.1. Discrimination for Cox model 44
1.2. Calibration for Cox model 44
2. Model validation of other MLTs 45
3. K-fold Cross Validation for MLT based prediction models to avoid over-fitting 46
III. RESULTS 48
A. Literature review for selection of candidate predictors 48
1. Selection of candidate prognostic factors with literature review 48
2. Model constructing feature sets with selecting prognostic factors 51
B. Baseline characteristics 52
1. Demographics of participants characteristics and survival data 52
2. Candidate selection from statistical analyses 54
2.1. Univariate analysis of HRQOL mean scores between non-event and event groups 54
2.2. Univariate analysis of BMI, weight change, and MET of lung cancer survivors 58
3. Final candidate variable selection for phased modeling 60
4. Result of data preprocessing 62
4.1. Missing imputation 62
C. Model development 64
1. Cox model development 65
1.1. Prediction model based on Cox regression analysis 67
1.2. Final prediction model equation for Cox models 71
2. Decision tree model development 72
2.1. Assessment of the relative importance and model developing 72
2.2. Selecting CP value for decision tree pruning using rpart packages 74
3. Random forest model development 76
4. Bagged decision tree model development 78
5. AdaBoost model development 79
6. Developed models applied with MLTs 81
D. Model validation and performance 88
1. Cox proportional hazard ratio model internal validation 88
1.1. Discrimination 88
1.2. Calibration 91
2. Comparison model performance of Cox model and other MLTs 96
IV. DISCUSSION 106
A. Literature review for selection of candidate predictors 107
B. Model development using Cox and other MLTs 109
C. Model validation of Cox regression model and application of the predictive models to other MLT based models 112
D. Clinical and practical implications 114
E. Strengths and limitations of this study 117
CONCLUSION 119
REFERENCES 120
국문 초록 133
APPENDIX 135
-
dc.formatapplication/pdf-
dc.format.extent3866492 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjectLung Cancer-
dc.subjectSurvival-
dc.subjectPrediction-
dc.subjectHRQOL-
dc.subjectMachine Learning-
dc.subject.ddc610.72-
dc.titleDevelopment of Survival Prediction Model for the Korean Disease-free Lung Cancer Survivors using Patient Reported Outcome variables: application to Cox proportional hazard regression model and diverse machine learning algorithms-
dc.title.alternative환자 보고 성과 지표를 활용한 한국인 폐암 무병 생존자 생존 예측 모형 개발-
dc.typeThesis-
dc.contributor.AlternativeAuthor심진아-
dc.description.degreeDoctor-
dc.contributor.affiliation의과대학 의과학과-
dc.date.awarded2018-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share