Browse

어휘 계량적 분석과 띄어쓰기 문제

Cited 0 time in Web of Science Cited 0 time in Scopus
Authors
한영균
Issue Date
2003-06
Publisher
서울대학교 규장각한국학연구원
Citation
한국문화, Vol.31, pp. 49-76
Abstract
It is well known that one of the most trouble-making problems in word frequency analysis of modern Korean corpora is irregulalrities in word spacing, especially that of the MWLUs(milti-word lexical units) including compounds. This comes from the facts that articles which regulate spacing of modern Korean have some contradictions and unclarity on one side, and that it is impossible to register all the MWI, US and compound words, even the full size dictionary, which can be used as a source of referencefor word-spacing, and most of the lexicons of language processing tools of Korean depend on paper dictionaries on the other. As a result, lists of compounds in word frequency lists show inconsistacy, and this influences the whole results of frequency analysis of a corpus. It is argued that to overcome such problems, it is preferable to make a list of compound words and MWLUs based on the corpus to be analysed, and the lexicon of language processing tools must be reorganized based on the list of compound words and MWLUs. And as this list can be used as a source of supplement for the revision of the dictionary which originally used for the word frequency analysis, the whole process of word frequency analysis shows circularity.
ISSN
1226-8356
Language
Korean
URI
http://hdl.handle.net/10371/66718
Files in This Item:
Appears in Collections:
Kyujanggak Institute for Korean Studies (규장각한국학연구원)Korean Culture (한국문화) Korean Culture (한국문화) vol.31 (2003)
  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Browse