Vision Transformer 기반 심음 분류 연구

김준엽

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Vision Transformer 기반 심음 분류 연구 : Classification of Phonocardiogram Recordings Using Vision Transformer Architecture

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 김준엽

Advisor: 서봉원

Issue Date: 2023

Publisher: 서울대학교 대학원

Keywords: 심음도 ; 심잡음 ; Vision Transformer ; 스펙트로그램 ; 어텐션 마스크

Description: 학위논문(석사) -- 서울대학교대학원 : 융합과학기술대학원 지능정보융합학과, 2023. 2. 서봉원.

Abstract: 심장 질환의 진단은 중요한 의학적 정보인 심장의 상태 및 기능에 대한 정보를 제공하는 심음(heart sound)을 측정하는 것으로부터 시작된다. 청진기를 통해 심음을 듣고 진단하는 것은 정밀검사에 비해 정확도가 떨어지지만, 용이하고 비용이 거의 들지 않기 때문에 진단에 있어 필수적으로 사용된다. 건강한 사람에게서는 규칙적이고 명확한 심장 박동 소리가 들리지만, 그렇지 않은 사람에게서는 심장 소리와 함께 잡음이 함께 들리기도 한다. 이러한 잡음을 심잡음(heart murmur) 이라고 하며, 심잡음의 특징과 잡음이 들리는 위치 등을 사용하여 심장병을 판단할 수 있다. 심음 데이터를 녹음해서 만들어진 심음도(PCG; Phonocardiograms) 데이터로 이 사람이 심장병 환자인지 유무를 탐지하거나, 심장 소리에 이상이 있는지 등을 탐지할 수 있다. 심음에서 비정상적인 심장 기능을 감지하기 위한 자동화된 모델을 개발하는 것은 개발도상국과 같이 전문가와 자본이 부족한 나라의 심장병으로 고통받는 사람들에게 중요한 연구 주제이다.
의료데이터를 모델링 하는 방법에는 고전적인 기계학습(Machine Learning) 기반의 모델을 활용하는 방법과, 딥러닝(Deep Learning) 기반의 모델을 활용하는 방법으로 나눌 수 있다. 기계학습 기반의 의료데이터 모델링은 특성(feature)을 직접 추출해야 하므로, 사용하는 데이터에 대해 연구자의 사전지식과 전처리(pre-processing) 방법들이 큰 영향을 미친다. 반면 딥러닝 기반의 모델은 이러한 특성까지도 모델이 직접 학습하여 추출하기 때문에, 사용하는 데이터에 대한 연구자의 사전지식과 전처리 방법의 영향이 비교적 낮다. 전통적으로 의료데이터 모델링에는 기계학습 모델들이 많이 사용되었으나, 최근 의료데이터 분야에서는 딥러닝 분야에서 좋은 성능을 보이는 모델을 사용해서 기존 의료데이터 모델링에서 성능을 높이는 연구들이 진행되고 있으며, 고전적인 기계학습들보다 더 좋은 성능을 보인다. 하지만 딥러닝 기반의 모델들은 대부분 결과에 대한 해석을 할 수 없어, 전문가의 진단에 도움을 주는 것에 어려움을 겪고 있다.
본 논문에서 사용되는 Vision Transformer 구조의 경우에는 셀프 어텐션(self-attention)이 포함되어 있으며, 이는 결과를 이해하는 데 도움이 되는 어텐션 점수(attention score)를 계산하여 이를 어텐션 마스크(attention mask)로 시각화 할 수 있다는 장점과, 컴퓨터 비전(Computer Vision)분야의 이미지 분류(Image Classification) 태스크(task)에서 높은 성능을 보이는 장점을 가진 모델이다.
본 논문에서는 여러 청진 위치에서 측정된 심장 소리 녹음에서 잡음의 유무와 임상 결과를 잘 감지하는 모델을 제안한다. 보다 높은 분류 성능과 결과의 해석에 도움을 주기 위해 시각적인 접근 방식을 사용했다. 제안된 모델은 신호의 리샘플링(resampling)이나 필터링(filtering) 없이 심장 소리 신호를 스펙트로그램(spectrogram)으로 변환하고, 변환된 이미지를 환자의 인구통계학적 정보와 함께 입력 받아 심잡음과 임상 결과를 추론(inference)한다. 임상 결과 식별 작업의 경우 테스트 데이터에 대한 대회 비용 함수 점수 11943을, 심잡음 분류에 대해서는 0.69의 가중치 정확도를 기록했다. 또한 모델에는 어느 부분을 보고 판단했는지 시각화가 가능한 어텐션 마스크가 포함되어 있다.
Diagnosis of cardiac disease starts with measuring heart sound. It provides information about the cardiac condition and function, which is important medical information. Diagnosis by listening to a heart sound through a stethoscope is less accurate than a detailed examination, but it is essential for diagnosis because it is easy and inexpensive. A healthy person can hear a regular and clear heartbeat, but in an unhealthy person, a murmur can be heard along with the heartbeat. Such a noise is called a heart murmur, and if the characteristics of the heart murmur and the location at which the noise is heard are used, heart disease can be judged by this alone. Phonocardiograms (PCG) data created by recording heart sound data can detect whether a person has a heart disease, detect abnormal heart sounds, etc. Developing models is an important research topic for people suffering from heart disease in countries with limited expertise and capital, such as developing countries.
The method of modeling medical data can be divided into a method using a classical machine learning-based model and a method using a deep learning-based model. Since machine learning-based medical data modeling requires the direct extraction of features, the researcher's prior knowledge and pre-processing methods have a great influence on the data used. On the other hand, in deep learning-based models, the influence of the researcher's prior knowledge and preprocessing method on the data used is relatively low because the model learns and extracts even these characteristics. Traditionally, many machine learning models have been used for medical data modeling, but recently, in the medical data field, studies are underway to improve the performance of existing medical data modeling by using a model with good performance in the deep learning field. shows better performance than However, most deep learning-based models cannot interpret the results, so it is difficult to help experts in diagnosis.
In the case of the Vision Transformer structure used in this paper, self-attention is included, which calculates an attention score that helps to understand the result and converts it into an attention mask, and it is a model with the advantage of showing high performance in the image classification task in the computer vision field.
In this paper, proposing a model that detects the presence or absence of murmurs from multiple heart sound recordings from multiple auscultation locations, as well as detecting the clinical outcomes from phonocardiogram well. A visual approach was used to aid in the interpretation of results and higher classification performance. The proposed model converts heart sound signals into spectrograms without requiring resampling or signal filtering, and infers cardiac noise and clinical outcomes by receiving the image with the patient's demographic information. For the clinical outcome identification task on the test data, it shows a Challenge cost score of 11943. The result shows a weighted accuracy score of 0.69 for the murmur detection classification on the test data. In addition, the model includes an attention mask that allows visualization of which part was viewed and judged.

Language: kor

URI: https://hdl.handle.net/10371/194095

https://dcollection.snu.ac.kr/common/orgView/000000174142

Files in This Item:

000000174142.pdf 1.62 MB

Appears in Collections:

Graduate School of Convergence Science and Technology (융합과학기술대학원)
- Dept. of Intelligence and Information (지능정보융합학과)
  - Theses (Master's Degree_지능정보융합학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share