Building Named Entity Knowledge Graph Using Named Entity Normalization

전성환

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Building Named Entity Knowledge Graph Using Named Entity Normalization : 고유명사 정규화 기법을 이용한 지식 그래프 구축

DC Field	Value	Language
dc.contributor.advisor	조성준	-
dc.contributor.author	전성환	-
dc.date.accessioned	2023-06-29T01:52:13Z	-
dc.date.available	2023-06-29T01:52:13Z	-
dc.date.issued	2023	-
dc.identifier.other	000000176653	-
dc.identifier.uri	https://hdl.handle.net/10371/193131	-
dc.identifier.uri	https://dcollection.snu.ac.kr/common/orgView/000000176653	ko_KR
dc.description	학위논문(박사) -- 서울대학교대학원 : 공과대학 산업공학과, 2023. 2. 조성준.	-
dc.description.abstract	Text mining aims to extract the information from documents to derive valuable insights. The knowledge graph provides richer information from various documents. Past literature responded for such needs by building technology trees or concept network from the bibliographic information of the documents, or by relying on text mining techniques in order to extract keywords and/or phrases. In this paper, we propose a framework for building a knowledge graph using named entities. The knowledge graph construction framework in this paper satisfies the following conditions: (1) extracting the named entity in the completed form, (2) Building datasets that can be trained and be evaluated by the named entity normalization models in various domains such as finance and technical documents in addition to bio-informatics, where existing NEN research has been active, (3) creating the better performing named entity normalization model, and (4) constructing the knowledge graph by grouping named entities with the same meaning that appear in various forms.	-
dc.description.abstract	텍스트 마이닝은 다양한 인사이트를 얻기 위해 문서에서 정보를 추출하는 것을 목표로 한다. 문서의 정보를 표현하는 방식 중 하나인 지식 그래프는 다양한 문서에서 더욱 풍부한 정보를 제공한다. 기존 연구들은 텍스트 마이닝 기법을 이용하여 문서의 정보들로 기술 트리 또는 개념 네트워크를 구축하거나 키워드 및 구문을 추출하였다. 본 논문에 서는 고유명사를 이용하여 지식 그래프를 구축하기 위한 프레임워크를 제안한다. 본 논문의 지식 그래프 구축 프레임워크는 다음과 같은 조건을 만족한다. (1) 고유명사를 사람이 이해하기 쉬운 형태로 추출한다. (2) 기존 고유명사 정규화 연구가 활발했던 생물정보학 외에 금융 문서, 반도체 관련 특허 문서에서 추출한 고유명사로 고유명사 정규화 데이터셋을 구축한다. (3) 더 나은 성능의 고유명사 정규화 모델을 구축한다. (4) 다양한 형태의 동일한 의미를 가진 고유명사를 그룹화하여 지식 그래프를 구축한다.	-
dc.description.tableofcontents	Chapter 1 Introduction 1 Chapter 2 Literature review 5 2.1 Named entity normalization dataset 5 2.2 Named entity normalization 6 2.3 Knowledge graph construction 9 Chapter 3 Dictionary construction for named entity normalization 11 3.1 Background 11 3.2 Dictionary construction methods 12 3.2.1 Finance named entity normalization dataset 12 3.2.2 Patent named entity normalization dataset 18 3.3 Chapter summary 24 Chapter 4 Named entity normalization model using edge weight updating neural network 26 4.1 Background 26 4.2 Proposed model 28 4.2.1 Ground truth entity graph construction 31 4.2.2 Similarity-based entity graph construction 32 4.2.3 Edge weight updating neural network training 35 4.2.4 Edge weight updating neural network inferencing 38 4.3 Experiment results 39 4.3.1 Datasets 39 4.3.2 Experiment settings: named entity normalization in bioinformatics 40 4.3.3 Experiment Settings: Named Entity Normalization in Finance 42 4.4 Results 44 4.4.1 Quantitative Analysis: Bioinformatics 45 4.4.2 QuantitativeAnalysis:Finance 46 4.4.3 QualitativeAnalysis 47 4.5 Chapter summary 51 Chapter 5 Building knowledge graph using named entity recognition and normalization models 53 5.1 Background 53 5.2 Proposed model 55 5.2.1 Named entity normalization 56 5.2.2 Construction of the semiconductor-related patent knowledge graph 61 5.3 Experiment results 62 5.3.1 Comparison models 62 5.3.2 Parameters ettings 64 5.4 Results 64 5.4.1 Quantitative evaluations 64 5.4.2 Qualitative evaluations 70 5.4.3 Knowledge graph visualization and exemplary investigation 71 5.5 Chapter summary 75 Chapter 6 Conclusion 77 6.1 Contributions 77 6.2 Future work 78 Bibliography 79 국문초록 92 감사의 글 93	-
dc.format.extent	ix, 93	-
dc.language.iso	eng	-
dc.publisher	서울대학교 대학원	-
dc.subject	Named entity normalization	-
dc.subject	Edge weight updating neural network	-
dc.subject	Text mining in bioinformatics	-
dc.subject	Text mining in finance	-
dc.subject	Text Mining in patent documents	-
dc.subject	Named entity graph	-
dc.subject	Knowledge graph	-
dc.subject	Patent graph	-
dc.subject	Keyword extraction	-
dc.subject.ddc	670.42	-
dc.title	Building Named Entity Knowledge Graph Using Named Entity Normalization	-
dc.title.alternative	고유명사 정규화 기법을 이용한 지식 그래프 구축	-
dc.type	Thesis	-
dc.type	Dissertation	-
dc.contributor.AlternativeAuthor	Sung Hwan Jeon	-
dc.contributor.department	공과대학 산업공학과	-
dc.description.degree	박사	-
dc.date.awarded	2023-02	-
dc.identifier.uci	I804:11032-000000176653	-
dc.identifier.holdings	000000000049▲000000000056▲000000176653▲	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Industrial Engineering (산업공학과)
  - Theses (Ph.D. / Sc.D._산업공학과)

Files in This Item:

000000176653.pdf 5.49 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share