Publications

Detailed Information

Automated extraction of Biomarker information from pathology reports

DC Field Value Language
dc.contributor.authorLee, Jeongeun-
dc.contributor.authorSong, Hyun-Je-
dc.contributor.authorYoon, Eunsil-
dc.contributor.authorPark, Seong-Bae-
dc.contributor.authorPark, Sung-Hye-
dc.contributor.authorSeo, Jeong-Wook-
dc.contributor.authorPark, Peom-
dc.contributor.authorChoi, Jinwook-
dc.date.accessioned2018-07-23T07:39:39Z-
dc.date.available2018-07-23T16:40:46Z-
dc.date.issued2018-05-21-
dc.identifier.citationBMC Medical Informatics and Decision Making, 18(1):29ko_KR
dc.identifier.issn1472-6947-
dc.identifier.urihttps://hdl.handle.net/10371/142747-
dc.description.abstractBackground
Pathology reports are written in free-text form, which precludes efficient data gathering. We aimed to overcome this limitation and design an automated system for extracting biomarker profiles from accumulated pathology reports.

Methods
We designed a new data model for representing biomarker knowledge. The automated system parses immunohistochemistry reports based on a slide paragraph unit defined as a set of immunohistochemistry findings obtained for the same tissue slide. Pathology reports are parsed using context-free grammar for immunohistochemistry, and using a tree-like structure for surgical pathology. The performance of the approach was validated on manually annotated pathology reports of 100 randomly selected patients managed at Seoul National University Hospital.

Results
High F-scores were obtained for parsing biomarker name and corresponding test results (0.999 and 0.998, respectively) from the immunohistochemistry reports, compared to relatively poor performance for parsing surgical pathology findings. However, applying the proposed approach to our single-center dataset revealed information on 221 unique biomarkers, which represents a richer result than biomarker profiles obtained based on the published literature. Owing to the data representation model, the proposed approach can associate biomarker profiles extracted from an immunohistochemistry report with corresponding pathology findings listed in one or more surgical pathology reports. Term variations are resolved by normalization to corresponding preferred terms determined by expanded dictionary look-up and text similarity-based search.

Conclusions
Our proposed approach for biomarker data extraction addresses key limitations regarding data representation and can handle reports prepared in the clinical setting, which often contain incomplete sentences, typographical errors, and inconsistent formatting.
ko_KR
dc.description.sponsorshipThis study was financially supported by a grant of the Korean Health Technology R&D project, Ministry of Health & Welfare, Republic of Korea (A112005) and by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (2010–0028631)ko_KR
dc.language.isoenko_KR
dc.publisherBioMed Centralko_KR
dc.subjectBiomarkersko_KR
dc.subjectCancer disease knowledge representation modelko_KR
dc.subjectPathology reportsko_KR
dc.subjectNatural language processingko_KR
dc.subjectClinical decision-makingko_KR
dc.titleAutomated extraction of Biomarker information from pathology reportsko_KR
dc.typeArticleko_KR
dc.contributor.AlternativeAuthor이정근-
dc.contributor.AlternativeAuthor송현제-
dc.contributor.AlternativeAuthor윤은실-
dc.contributor.AlternativeAuthor박성배-
dc.contributor.AlternativeAuthor박성혜-
dc.contributor.AlternativeAuthor서정욱-
dc.contributor.AlternativeAuthor박범-
dc.contributor.AlternativeAuthor최진욱-
dc.identifier.doi10.1186/s12911-018-0609-7-
dc.language.rfc3066en-
dc.rights.holderThe Author(s).-
dc.date.updated2018-05-27T03:36:50Z-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share