Estimating the Helpfulness of Product Reviews based on Review Information Types

김문형

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Estimating the Helpfulness of Product Reviews based on Review Information Types : 리뷰 정보 유형에 기반한 상품평 유용성 평가

DC Field	Value	Language
dc.contributor.advisor	신효필	-
dc.contributor.author	김문형	-
dc.date.accessioned	2017-07-14T01:04:01Z	-
dc.date.available	2017-07-14T01:04:01Z	-
dc.date.issued	2016-08	-
dc.identifier.other	000000137289	-
dc.identifier.uri	https://hdl.handle.net/10371/121624	-
dc.description	학위논문 (박사)-- 서울대학교 대학원 : 언어학과, 2016. 8. 신효필.	-
dc.description.abstract	The sheer number of product reviews for any given product makes it impossible for potential customers to locate those reviews that will be helpful to them. This results in the need to automatically estimate the helpfulness of product reviews such that customers may locate the most helpful ones as quickly and easily as possible. Researchers have explored multiple ways of evaluating review helpfulness, but have mainly focused on how reviews deliver information, i.e., the length, sentiment aspect, readability, etc. However, we make the assumption that it is more important to consider what information reviews deliver to customers than how that information is delivered. Therefore, this study investigates a way of extracting what information reviews deliver to estimate the helpfulness of those reviews. To extract information that reviews contain, we categorized the review information types (RIT) for each sentence. When considering the information target, information can be divided into background information about the reviewers previous experience or expertise, core information about the product, peripheral information about non-product information, such as shipping or packaging, and none-relevant information. Overall information contains final purchasing decision, summary and recommendations. Once the information type of each sentence is categorized, every sentence is converted into a topic dimension vector with the Latent Dirichlet Allocation. For each type of information, topic-based vectors are clustered to find similar-information holding clusters. Then, these clusters are used to extract what information each sentence delivers for sentences in product review test data. The product reviews are collected for an e-book reader, outdoor tent, and jeans from Amazon.com. For each product domain, 200 reviews are chosen for training and testing for various experiments. The helpfulness score for reviews and review information type for each sentence are manually annotated for this study. To begin with, we present to what extent it is possible to correctly predict the information type of each sentence through various classification experiments. The review information type of each sentence is predicted based on various features: such as bag-of-words, the position of the sentence in a review, and the form and part-of-speech tag for main subject, verb, and auxiliaries. A preliminary experiment was conducted to foresee the possibility of using background information to predict the helpfulness of product reviews. This experiment result indicates that our approach with only background information performs as effectively as the features from previous studies. The final experiments are to mainly show the effect of extracting what information is delivered compared with that of extracting how information is delivered on estimating the helpfulness of product reviews. Through various experiments, we proved that our approach of extracting what information is delivered can more accurately estimate the helpfulness of reviews than features related with how information is delivered.	-
dc.description.tableofcontents	1 Introduction 1 1.1 Research Summary 4 1.1.1 Review Information Types for Estimating Review Helpfulness 4 1.1.2 Problem Statement 7 1.1.3 Investigation of Hypotheses 10 1.2 Outline 11 2 Background 13 2.1 Predicting Helpfulness of Reviews 13 2.2 Factors for Review Helpfulness 15 2.2.1 Basic Factors 16 2.2.2 Readability 17 2.2.3 Subjectivity 18 2.2.4 Content 20 2.3 Summary 24 3 Extracting Information from Reviews 25 3.1 Review Information Types 25 3.1.1 Motivation 25 3.1.2 Introducing Review Information Types 26 3.1.3 Difficulties and Ambiguities 28 3.2 Finding Similar Information-bearing Sentences 31 3.2.1 Sentence Representations 32 3.2.2 Clustering Similar Information-bearing Sentences 35 3.3 The Summary of the Extracting Procedure 37 4 Preparing Product Reviews 40 4.1 Collecting Data 40 4.2 The Review Helpfulness Vote Score 42 4.3 Building Review Helpfulness Manual Score 45 4.3.1 Annotating Manual Helpfulness Score 45 4.3.2 Evaluation of Review Helpfulness Manual Score 48 4.4 Annotation of Information Types 53 4.5 Summary 55 5 A Preliminary Study: Introducing Background Information Type for Product Review Helpfulness 57 5.1 Task Description 57 5.2 Data Collection 58 5.3 Extracting Background Information 58 5.3.1 Pattern Matching for Background Information 59 5.3.2 Seed-based Information Extraction 60 5.3.3 Topic-based Information Extraction 61 5.3.4 Features 61 5.4 Experiments and Analysis 64 5.4.1 Experiment Setting 64 5.4.2 Model 64 5.4.3 The Evaluation Metrics 66 5.4.4 Results and Analysis 70 5.5 Summary 72 6 Recognition of Review Information Types 74 6.1 Task Description 74 6.2 Models 75 6.2.1 Unsupervised Clustering Methods 76 6.2.2 Supervised Learning Models 76 6.3 Features for Recognizing Information Types 80 6.4 Recognition of Review Information Types 81 6.4.1 The Results with Clustering Models 81 6.4.2 The Results with SVM model 83 6.4.3 The Results with CRF model 88 6.5 The Summary of Recognizing Information Types 91 7 Estimation of Review Helpfulness 93 7.1 Task Description and Restriction 93 7.2 Data Collection 94 7.3 Features for Estimating the Review Helpfulness 94 7.3.1 Baseline (BASE) 94 7.3.2 Features from Previous Studies 95 7.3.3 Product Aspect Keyword-based Features (ASPECT) 99 7.3.4 The Proportion of Information Types (INFO_TYPE) 101 7.3.5 The Semantics of Sentence Information 101 7.4 Experimental Setting 103 7.4.1 Evaluating Clustering Algorithms 104 7.5 Experiment Results 108 7.5.1 Gold Standard Ranking Validation 109 7.5.2 Sentence Representations 112 7.5.3 The Best Feature Combinations 114 7.5.4 Whole Document vs Separate Sentences 125 7.5.5 No Distinction on Information types 126 7.5.6 Review Helpfulness Evaluation with Predicted Sentence Information Types 127 7.5.7 The Product Domain Adaptation 129 7.6 Summary 130 8 Conclusions and Future Directions 133 8.1 Summary of Contribution and Results 134 8.1.1 Categorization of Information Types 134 8.1.2 Review Helpfulness Annotation 134 8.1.3 Features for Recognizing Review Information Types 135 8.1.4 Computational Modeling of Information Type Recognition 135 8.1.5 Features and Computational modeling for Estimating Review Helpfulness 136 8.2 Future Directions and Open Problems 137 8.2.1 Extraction of Sentence Information 137 8.2.2 Topic based Clustering 138 8.2.3 Remaining Practical Issues 138 8.2.4 Expandability of Review Information Types 139 REFERENCES 140 Appendix 142 Appendix I. Product lists and Ids from Amazon.com 142 Appendix II. Regular patterns for finding background information of e-book reader reviews 144 Appendix III. Groups of product features for each product domain 146 국문 초록 152	-
dc.format	application/pdf	-
dc.format.extent	1748839 bytes	-
dc.format.medium	application/pdf	-
dc.language.iso	en	-
dc.publisher	서울대학교 대학원	-
dc.subject	review helpfulness estimation	-
dc.subject	review information types	-
dc.subject	latent dirichlet allocation	-
dc.subject	topic-based approach	-
dc.subject	product review evaluation	-
dc.subject.ddc	401	-
dc.title	Estimating the Helpfulness of Product Reviews based on Review Information Types	-
dc.title.alternative	리뷰 정보 유형에 기반한 상품평 유용성 평가	-
dc.type	Thesis	-
dc.description.degree	Doctor	-
dc.citation.pages	154	-
dc.contributor.affiliation	인문대학 언어학과	-
dc.date.awarded	2016-08	-

Appears in Collections:

College of Humanities (인문대학)
- Linguistics (언어학과)
  - Theses (Ph.D. / Sc.D._언어학과)

Files in This Item:

000000137289.pdf 1.67 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share