Publications

Detailed Information

A Study on the use of Etymology for Semantic Knowledge Extraction

DC Field Value Language
dc.contributor.advisor정교민-
dc.contributor.authorPablo Estrada-
dc.date.accessioned2017-07-19T08:42:42Z-
dc.date.available2017-07-19T08:42:42Z-
dc.date.issued2016-08-
dc.identifier.other000000135984-
dc.identifier.urihttps://hdl.handle.net/10371/131254-
dc.description학위논문 (석사)-- 서울대학교 대학원 : 계산과학전공, 2016. 8. 정교민.-
dc.description.abstractEtymology is the study of the composition of words through their historical roots. It is a rich area of study that dates back millennia, and that has contributed significantly to our understanding of human cultures and languages. The field of computational linguistics is a much younger field that grew from the advent of the digital era-
dc.description.abstractand that has advanced continuously, even nowadays with the changes brought by Artificial Intelligence and Machine Learning. Computational linguistics have not yet leveraged the knowledge of etymology to its full potential. This work is a step to make etymology another contributor to the field of computational linguistics.

In this work we propose a framework to capture the complex etymological relationships that exist in the vocabulary of a human language by creating a complex network that associates words with their historical roots. We then use this framework to obtain insights into the semantics of the words that are part of the Chinese and Korean languages. We run two tasks: one of supervised learning, and one of unsupervised learning, and show that etymology can be effectively used to extract knowledge.

We believe that this work helps push etymology into the main stage of computational linguistics, and natural language processing.
-
dc.description.tableofcontents1 Introduction 6
1.1 Synonym pair classification 7
1.2 Word embedding 9

2 Literature review 11

3 An etymological graph-based framework 16
3.1 Building an Etymological Graph 16
3.2 Obtaining semantic knowledge from the graph 18

4 Two use-cases of the framework 21
4.1 Supervised learning: Finding synonyms though classification 21
4.1.1 The edge classification problem 23
4.1.2 The synonym-link prediction problem 23
4.1.3 Results of the classification schemes 24
4.2 Unsupervised learning: Word embedding with etymology 25
4.2.1 Learning word embeddings 25
4.2.2 Verifying the word embeddings: Synonym discovery 28
4.2.3 Performance of synonym discovery task 28
4.2.4 Computation speed of our model 31

5 Discussion, Conclusion, and Future Work 33
5.1 Discussion 33
5.2 Conclusion and Future Work 34

Bibliography 36

Abstract in Korean 41

Appendices 42
A Unipartite projection 42
B Supervised learning features 43
C Performance of embeddings 44
-
dc.formatapplication/pdf-
dc.format.extent6255635 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subjectgraph mining-
dc.subjectetymology-
dc.subjectcomputational linguistics-
dc.subjectchinese language-
dc.subject.ddc004-
dc.titleA Study on the use of Etymology for Semantic Knowledge Extraction-
dc.typeThesis-
dc.description.degreeMaster-
dc.citation.pages44-
dc.contributor.affiliation자연과학대학 협동과정 계산과학전공-
dc.date.awarded2016-08-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share