진료실 대화에 대한 음성인식 솔루션의  정확도 평가

이승화

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

진료실 대화에 대한 음성인식 솔루션의 정확도 평가 : Accuracy of Cloud-based Speech Recognition Solutions for Medical Conversations of Korean

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 이승화

Advisor: 최진욱

Issue Date: 2023

Publisher: 서울대학교 대학원

Keywords: 음성인식 ; 인공지능 ; 의무기록 ; 전자의무기록

Description: 학위논문(박사) -- 서울대학교대학원 : 의과대학 의학과, 2023. 2. 최진욱.

Abstract: Background: There are limited data on the accuracy of cloud-based speech recognition (SR) solutions for medical conversations. The purpose of present research was to evaluate the applicability of current SR systems to real world doctor-patients conversation for future research. To achieve the purpose, we aimed to evaluate the accuracy of cloud-based SR solutions in discerning medical conversation both medical terminology and clinicians question presented in Korean, a non–Latin-based language, using records and transcriptions of real doctor–patient conversations and dedicated dataset for AI training and to find out the applicability to real world doctor-patients conversation recording.
Methods: We analyzed the SR accuracy of currently available cloud-based SR solutions using real doctor–patient medical terms recordings collected from an outpatient clinic at a large tertiary medical center in Korea. After first analysis, we analyzed the accuracy of current SR engines about doctors speeches using dedicated dataset available at aihub.or.kr. For each original and SR transcription, we analyzed the accuracy rate of each cloud-based SR solutions by clinicians judge and evaluation metrics (Bleu, CIDEr, ROUGE and METEOR score).
Results: The results of accuracy for medical terms as follows. A total of 112 doctor–patient conversation recordings were converted with three cloud-based SR solutions (Naver Clova SR from Naver Corporation, Seongnam, Korea; Google speech-to-text from Alphabet Inc., Mountain View, CA, USA; and Amazon Transcribe from Amazon.com, Seattle, WA, USA), and each transcription was compared. Naver Clova SR (75.1%) showed the highest accuracy with the recognition of medical terms compared to the other solutions (Google speech-to-text, 50.9%, P < 0.001; Amazon Transcribe, 57.9%, P < 0.001), and Amazon Transcribe demonstrated higher recognition accuracy compared to Google speech-to-text (P < 0.001). In the sub-analysis, Naver Clova SR showed the highest accuracy in all areas according to word classes, but Google speech-to-text showed the highest recognition accuracy of words longer than five syllables, without statistical significance.
In the aspect of SR accuracy for medical speech, we extracted a total of 500 doctors questions from dataset for speech of health care provider and patient for telemedicine of aihub.or.kr. The extracted doctors questions were converted with three cloud-based SR soultions (Naver Clova SR from Naver Corporation, Seongnam, Korea; Kakao API, Speech-to-Text (demo) from Kakao Corp., Jeju, Korea; and Google speech-to-text from Alphabet Inc., Mountain View, CA, USA), and comparisons of accuracy were evaluated via clinicians judge and automated methods. Naver Clova SR showed the highest accuracy judged by clinicians (Naver, 89.7% vs. Kakao, 77.2% vs. Google, 66.0%; p< 0.001; respectively). In the aspects of automated methods, Bleu-1 score of Naver was the highest among three SR engines (Naver, 0.654 vs. Kakao, 0.578 vs. Google, 0.535; p < 0.001 respectively). Moreover, Bleu-2 (0.557 vs. 0.463 vs. 0.418; p < 0.001; respectively) and Bleu-3 score (0.389 vs. 0.306 vs. 0.262; p < 0.001; respectively) of Naver were the highest compared to Kakao and Google. Kakao showed higher Bleu-1, 2, and 3 scores compared to Google with statistical significance. CIDEr, METEOR, and ROGUE scores presented the same results that Naver Clova SR showed, the highest SR accuracy among three SR engines.
Conclusions: Current cloud-based SR solutions have limitations in the recognition of medical terminology. The SR solutions manufactured by a domestic company showed the highest recognition accuracy among the three solutions assessed in this study. Meanwhile, SR accuracy rate of medical conversation using dedicated database for AI training showed acceptable accuracy. There is still room for improvement of this promising technology and consideration about construction of dataset of low-resource language is needed.
배경: 임상의사의 진료기록 작성을 보조하기 위한 음성인식 시스템은 1980년대부터 의료영역에서 사용 중이다. 그러나 클라우드 기반 음성인식 솔루션의 진료실 대화에 대한 인식 정확도에 대한 연구는 부족한 상황이다. 본 연구에서는 클라우드 기반 음성인식 솔루션을 사용하여 진료실 대화 인식을 통해 진료기록 작성 자동화의 가능성을 알아보고자 하였다. 이를 위해 현재 상용화된 클라우드 기반 음성인식 솔루션들의 한국어 의료대화의 인식률에 대하여 비교연구하였다.
방법: 삼성서울병원 순환기내과 외래 진료를 위해 방문한 환자와 의사와의 실제 진료대화를 녹음하여 현재 사용 가능한 클라우드 기반 음성인식 솔루션의 음성인식 정확도를 비교 분석하였다. 이에 더하여 aihub.or.kr 에서 제공하는 인공지능 훈련을 위한 데이터 셋을 사용하여 의료진의 질문에 대한 음성인식 정확도를 추가로 비교분석하였다.
Results: 의학용어의 음성인식 정확도 연구를 위하여 총 112명의 환자-의사간 대화가 녹음되었으며 Naver Clova SR, Google Speech-to-text, Amazon Transcribe의 총 세종류의 클라우드 기반 음성인식 솔루션을 사용하여 음성인식 작업을 시행한 후 실제 대화를 토대로 작성한 대본에서 나온 의학용어와 각각의 솔루션에서 인식한 의학용어의 인식률을 비교 분석하였다. 세가지 음성인식 솔루션 중 Naver Clova SR이 가장 높은 의학용어의 인식 정확도를 보여주었다. (75.1% vs. 50.9% vs. 57.9%, P < 0.001) 추가적으로 Amazon Transcribe가 Google Speech-to-text와 비교하였을 때 통계적으로 유의미한 높은 음성 인식 정확도를 보여주었다. 하위 분석에서 Naver Clova SR은 전반적으로 높은 음성인식 정확도를 보여주었으며, 5글자 이상의 의학용어에서는 구글 음성인식이 더 높은 인식 정확도를 보여주었으나 통계적으로 유의미하지는 않았다.
의료진의 질문의 음성인식 정확도 분석을 위해서 총 500개의 문장을 aihub.or.kr 데이터에서 추출하였으며 Naver Clova SR, Kakao API Speech-to-text, Google Speech-to-text의 세 종류의 음성인식 솔루션을 사용하여 음성인식을 수행 후 비교분석 하였다. 임상의의 판정 및 자동화 측정 지표를 사용하여 음성인식의 정확도를 비교하였으며 Naver Clova SR이 다른 두 솔루션에 비해 유의미하게 높은 음성인식 정확도를 보여주었다. 임상의의 판정에서는 각각 Naver Clova SR (94.7%), Kakao API Speech-to-text (83.8%), Google Speech-to-text (76.7%)의 음성인식 정확도를 보여주었으며 (p<0.001), 자동화 측정지표에서는 Bleu-1 (0.654 vs. 0.578 vs. 0.535, p<0.001), Bleu-2 (0.557 vs. 0.463 vs. 0.418, p<0.001), CIDEr (4.18 vs. 3.39 vs. 3.02, p <0.001)로 Naver Clova SR이 가장 높은 음성인식 정확도를 보여주었다.
결론: 현재 상용화된 클라우드 기반 음성인식 솔루션은 환자-의사간 대화의 인식에 있어서 한계점이 있었으며, 실제 의료현장에서 진료실 대화의 음성인식을 통한 의무기록 작성시 바로 적용할 수 없음을 알 수 있었다. 다양한 음성인식 솔루션 중에서는 국내기업에서 제작한 솔루션이 가장 높은 인식률을 보여주었으며 각각의 음성인식 솔루션 별로 서로 다른 단어 영역에서 음성인식의 강점이 있음을 보여주었다. 추가적으로 실제 대화의 녹음보다는 인공지능 훈련을 위해 제작된 데이터의 음성인식 정확도가 높음을 알 수 있었다.
본 연구의 결과는 차후 음성인식 기술발전을 통하여 의료대화의 음성인식 정확도의 발전이 가능함을 보여주었다. 현재 클라우드 기반 음성인식 솔루션의 인식 정확도 개선을 위해서는 더 많은 의료산업에 적용 목적으로 정제된 훈련 데이터셋이 필요하다.

Language: kor

URI: https://hdl.handle.net/10371/194206

https://dcollection.snu.ac.kr/common/orgView/000000175382

Files in This Item:

000000175382.pdf 0.65 MB

Appears in Collections:

College of Medicine/School of Medicine (의과대학/대학원)
- Dept. of Medicine (의학과)
  - Theses (Ph.D. / Sc.D._의학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share