Publications

Detailed Information

Multi-source, unstructured and external data analytics for manufacturing process : 제조 프로세스를 위한 다중 소스, 비정형 및 외부 데이터 애널리틱스

DC Field Value Language
dc.contributor.advisor조성준-
dc.contributor.author고태훈-
dc.date.accessioned2017-07-13T06:07:55Z-
dc.date.available2017-07-13T06:07:55Z-
dc.date.issued2017-02-
dc.identifier.other000000142668-
dc.identifier.urihttps://hdl.handle.net/10371/118295-
dc.description학위논문 (박사)-- 서울대학교 대학원 : 산업·조선공학부, 2017. 2. 조성준.-
dc.description.abstractData integration means the task of combining data with various types residing at different sources, and providing the user with a unified view of these data. In this thesis, we consider the data integration as the process of creating data marts to be used as input to the machine learning and data mining models in a view of data analyzer and miners. Actually, three types of problems are encountered in the data integration process: How to integrate (1) data from various sources, (2) different types of data and (3) external data with internal data. To integrate these data, the enterprise must consider and solve some technical and manageral issues. To prove our concept, three real-world applications are introduced. Knowledge can be regarded as the most valuable asset of a manufacturing enterprise. Therefore, a manufacturer enterprise should collect the data representing its processes and environments and analyze the data to build a sustainable knowledge model.
First application is about generating user scenarions using online social media in the early steps of new product development (NPD) process. By strategic keyword searching, several novel user contexts are discovered from online social media. Based on contexts, domain experts can generate user scenarios for new features and functions of the target product.
Second application is to construct early engine fault detection models by integrating manufacturing, inspection and after-sales service data. In most cases, production data and after-sales service data are managed independent departments, even different companies. To detect engine faults which represent customer-perceived quality, data integration is the key to generate integrated data mart. In this application, In this study, one-class classification algorithms are used due to class-imbalance problem. To address multi-dimensionality of time series data, the symbolic aggregate approximation (SAX) algorithm is used for data segmentation. Then, binary genetic algo-rithm-based wrapper approach (BGA-wrapper) is applied to segmented data to find the optimal feature subset. As a result, an anomaly score for each engine is calculated. Experimental results show that the proposed method can detect defective engines with a high probability before they are shipped.
Final application is to discover knowledge from textual data in various sources. Despite many enterprises know the importance of managing key performance indicators (KPIs), most of quality activities are fulfilled according to analyzing attribute or quantitative value. It has the limitation to understand customers perspectives and exact defects. In this application, a novel active learning framework for dictionary expansion is introduced. In this framework, unsupervised natural language processing methods suitable for Koreans are applied to the data. As a result, proposed framework can construct domain-specific dictionary from almost zero-based one.
-
dc.description.tableofcontents1. Introduction 1
1.1 Unstructured and external data analytics for new product development 3
1.2 Multi-source and external data analytics for quality management 5
1.3 Multi-source and unstructured data analytics for after-sales service 7
1.4 Multi-source, unstructured and external data 9
2. Literature Review 13
2.1 Data integration 13
2.2 Idea generation, market research and social media 16
2.3 Machine learning-based anomaly detection algorithms 18
2.3.1 Gaussian mixture model and Parzen window density estimation 18
2.3.2 Local outlier factor 18
2.3.3 k-means clustering-based anomaly detection 19
2.3.4 Principal component analysis and kernel principal component analysis-based anomaly detection 19
2.3.5 Support vector data description 19
3. Unstructured and external data analyrics for new product development 21
3.1 Background 21
3.2 Proposed framework and method 24
3.3 Case study: Smart oven 27
3.4 Creating and evaluating user scenarios based on contexts from online social media 29
3,5 Summary 30
4. Multi-source and external data analytics for quality management 33
4.1 Background 33
4.2 Methods for an early-stage engine fault detection system 37
4.2.1 Machine learning-based anomaly detection model 37
4.2.2 Binary genetic algorithm-based wrapper feature subset selection 38
4.3 Application to engine manufacturing process 40
4.3.1 Manufacturing and selling process for heavy machinery engines 41
4.3.2 Data description 41
4.3.3 Data integration 43
4.3.4 Segmentation of multi-dimensional time series data 43
4.3.5 Extracting featuers from multi-dimensional time series data 44
4.4 Machine learning-based anomaly detection models and their performance 47
4.5 Discussion 50
4.5.1 Performance of anoamly detection models and costs 50
4.5.2 Durability of anomaly detection models 51
4.5.3 Relationship between anomaly detection models and conventional SPC methods 52
4.6 Scalable local outlier factor algorithm and its application to engine fault detection 52
4.6.1 Approximate nearest neighbor search methods 53
4.6.2 Datasets 55
4.7 Summary 56
5. Multi-source and unstructured data analytics for after-sales service 59
5.1 Background 59
5.2 Proposed framework and method 60
5.2.1 Unsupervised word segmentation in Korean / English mixed documents 61
5.2.2 Active learning module for word segmentation 65
5.2.3 Post processing 66
5.2.4 Tagging 66
6. Conclusion 67
6.1 Contributions 67
6.2 Future works 69
Bibliography 70
국문초록 80
-
dc.formatapplication/pdf-
dc.format.extent5986469 bytes-
dc.format.mediumapplication/pdf-
dc.language.isoen-
dc.publisher서울대학교 대학원-
dc.subject다중 소스 데이터-
dc.subject비정형 데이터-
dc.subject외부 데이터-
dc.subject기계학습-
dc.subject데이터 통합-
dc.subject제조 프로세스-
dc.subject신제품 개발-
dc.subject품질 관리-
dc.subject.ddc623-
dc.titleMulti-source, unstructured and external data analytics for manufacturing process-
dc.title.alternative제조 프로세스를 위한 다중 소스, 비정형 및 외부 데이터 애널리틱스-
dc.typeThesis-
dc.contributor.AlternativeAuthorTaehoon Ko-
dc.description.degreeDoctor-
dc.citation.pages81-
dc.contributor.affiliation공과대학 산업·조선공학부-
dc.date.awarded2017-02-
Appears in Collections:
Files in This Item:

Altmetrics

Item View & Download Count

  • mendeley

Items in S-Space are protected by copyright, with all rights reserved, unless otherwise indicated.

Share