Leveraging Temporal Information for Classification of Lung Cancer's Brain Metastases from Clinical Notes

안지용

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Leveraging Temporal Information for Classification of Lung Cancer's Brain Metastases from Clinical Notes : 시간적 맥락을 활용한 임상 기록에서 폐암의 뇌 전이상태 분류

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 안지용

Advisor: 이승근

Issue Date: 2023

Publisher: 서울대학교 대학원

Keywords: clinical note ; pseudo-labeling ; semi-supervised learning ; conditional random field ; BERT ; LSTM

Description: 학위논문(석사) -- 서울대학교대학원 : 데이터사이언스대학원 데이터사이언스학과, 2023. 8. 이승근.

Abstract: 폐암은 뇌로 자주 전이되는 가장 흔한 암 유형 중 하나로, 최적의 환자 치료와 정보 기반 의사 결정을 위해서는 암세포의 뇌 전이 상태를 정확히 분류하는 것이 중요하다. 본 연구에서는 폐암 환자의 MRI 판독소견서의 시간적 정보와 맥락적 정보를 함꼐 활용하여 폐암의 뇌전이 상태를 분류할 수 있는 두 가지 접근법을 제안한다. 첫번쨰 방법으로는 BERT기반의 사전된 모델을 fine-tuning하여 Conditional Random Field (CRF) 레이어와 결합하였으며, 두번째 방법으로는 사전학습된 모델에서 문장 수준의 임베딩 시퀀스를 추출하여 Bidirectional Long-Short Term Memory (BiLSTM)모델을 구축하였다. 데이터셋은 총 13,684개의 임상기록으로 구성되어있으며, 이 중 606개의 데이터만이 주석처리 되었다. 주석처리된 데이터의 수가 부족한 문제는 준지도학습 방법론을 동원하여 해결을 시도하였다. 450개의 주석이 달린 데이터를 활용하여 ClinicalBERT를 fine-tuning하였으며, 이를 통해 73.9%의 정확도를 달성하였다. 모델의 성능을 향상시키기 위해, 미세조정된 ClinicalBERT위에 CRF 레이어를 통합하였고, 이는 89.1%의 정확도를 달성하였다. 마지막으로, ClinicalBERT의 문장 수준의 임베딩을 사용하여 BiLSTM을 학습시켜 93.4%의 정확도를 달성하였다.
우리의 연구 결과는 임상기록을 사용한 폐암의 뇌 전이 상태 분류를 위해 시간 정보 준지도 학습 기법을 활용하는 것의 중요성을 확인하였고, 보다 신뢰할 수 있는 모델을 제공함으로써 의료진의 의사 결정을 도울수 있음을 시사한다.
Lung cancer is one of the most common types of cancer that frequently metastasizes to the brain. For optimal patient care and informed decision-making, accurate metastatic status classification is crucial.
In this study, we propose two approaches that can leverage temporal information with contextual information in clinical notes to categorize cancer status into four distinct classes. First, we combined a BERT-based model with a Conditional Random Field (CRF) layer. Second, we built a Bidirectional Long Short-Term Memory (BiLSTM) model with sequences of word embedding from the pre-trained model. The dataset comprises 13,684 clinical notes, of which only 606 are annotated.
We first fine-tune ClincalBERT with 450 annotated data, achieving an accuracy of 73.9 %. To augment the model's performance, a CRF layer is integrated on top of fine-tuned ClincalBERT, exploiting the temporal information provided by each note's date. The CRF layer is trained using 4,237 pseudo-labeled notes with a confidence threshold of 0.95, resulting in a model with 89.1 % accuracy. Additionally, we employ a semi-supervised approach while training a BiLSTM model with Clinical BERT's word embeddings, resulting in a model with 93.4 % accuracy.
Our findings underscore the significance of leveraging longitudinal information and semi-supervised learning techniques for cancer status classification using clinical notes, with implications for personalized medicine and clinical support systems.

Language: kor

URI: https://hdl.handle.net/10371/196717

https://dcollection.snu.ac.kr/common/orgView/000000177911

Files in This Item:

000000177911.pdf 0.92 MB

Appears in Collections:

Graduate School of Data Science (데이터사이언스 대학원)
- Theses (Master's Degree_데이터사이언스학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share