Discriminative Probabilistic Pattern Mining using Graph for Electronic Health Records

Abstract: 전자의료기록(Electronic Health Records)의 임상 노트에는 환자의 병력에 대한 유용한 정보가 많이 포함되어 있다. 그러나 임상 노트는 체계화되지 않은 데이터이며 그 양은 나날이 증가하고 있다. 따라서 임상 노트를 그룹화하고 분류하기 위한 신뢰할 수 있는 데이터 마이닝 기술이 필요하다. 기존의 데이터 마이닝 기술은 키워드의 빈도를 기반으로 생성된 빈발 패턴(frequent patterns)을 이용하여 그룹 분류 작업(classification)을 수행한다. 하지만 이러한 빈발 패턴은 전자의료기록의 임상 노트와 같이 복잡한 데이터의 분류를 위해 필요한 충분히 강력하고 명확하게 구별되는 특징을 갖고 있지 않다. 또한 빈발 패턴 기반 기술은 대규모 전자의료기록 데이터에 적용될 때 확장성과 계산 비용의 문제에 직면한다. 따라서 본 연구에서는 이러한 문제점을 해결하기 위해 확률적 판별 패턴 마이닝(discriminative probabilistic pattern mining) 알고리즘을 소개한다. 확률적 판별 패턴 마이닝 알고리즘에서는 전자의료기록의 임상 노트를 분류하기 위해 그래프 구조를 도입하여 빈발 패턴의 부분 그래프를 생성하게 된다.
본 연구에서는 판별력을 높이기 위해 개별 키워드를 사용하는 대신 이진 특성 조합에서의 동시 출현(co-occurrence)을 사용하여 임상 노트 분류를 위한 빈발 패턴 그래프를 구성한다. 각각의 동시 출현은 판별력(discriminative power)에 따른 log-odds 값으로 그 가중치를 갖는다. 임상 노트의 본질을 반영하는 그래프를 찾기 위해 확률적 판별 부분 그래프 검색을 수행하며 그래프의 허브(hub) 노드에서 시작하여 동적 프로그래밍(dynamic programming)을 사용하여 경로를 찾는다. 이러한 방법으로 검색한 빈발 부분 그래프를 이용하여 전자의료기록의 임상 노트에 대한 분류 작업을 수행하게 된다.
Electronic Health Records (EHR) contains plenty of useful information about patients medical history. However, EHR is highly unstructured data and amount of it is growing continuously, that is why there is a need in a reliable data mining technique to group and categorize clinical notes. Although, many existing data mining techniques for group classification use frequent patterns generated based on frequencies of keywords, these patterns do not possess strong enough distinguishing characteristics to show the difference between datasets to classify complex data such as clinical notes in EHR. Also, these techniques encounter scalability and computational cost problems when used on large EHR dataset. To address these issues, we introduce discriminative probabilistic pattern mining algorithm that uses a graph (DPPMG) to generate the subgraphs of frequent patterns for classification in electronic health records.

We use co-occurrence, a combination of binary features, which is more discriminative than individual keywords to construct discriminative probabilistic frequent patterns graph for clinical notes classification. Each co-occurrence has a weight of log-odds score that is associated with its discriminative power. The graph, which reflects the essence of clinical notes is searched to find discriminative probabilistic frequent subgraphs. To discover the discriminative frequent subgraphs, we start from a hub node in the graph and use dynamic programming to find a path. The discriminative probabilistic frequent subgraphs discovered by this approach are later used to classify clinical notes of electronic health records.

Language: eng

URI: https://hdl.handle.net/10371/161070

http://dcollection.snu.ac.kr/common/orgView/000000156485

Files in This Item:

000000156485.pdf 2.58 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share