Multi-source, unstructured and external data analytics for manufacturing process

고태훈

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Multi-source, unstructured and external data analytics for manufacturing process : 제조 프로세스를 위한 다중 소스, 비정형 및 외부 데이터 애널리틱스

DC Field	Value	Language
dc.contributor.advisor	조성준	-
dc.contributor.author	고태훈	-
dc.date.accessioned	2017-07-13T06:07:55Z	-
dc.date.available	2017-07-13T06:07:55Z	-
dc.date.issued	2017-02	-
dc.identifier.other	000000142668	-
dc.identifier.uri	https://hdl.handle.net/10371/118295	-
dc.description	학위논문 (박사)-- 서울대학교 대학원 : 산업·조선공학부, 2017. 2. 조성준.	-
dc.description.abstract	Data integration means the task of combining data with various types residing at different sources, and providing the user with a unified view of these data. In this thesis, we consider the data integration as the process of creating data marts to be used as input to the machine learning and data mining models in a view of data analyzer and miners. Actually, three types of problems are encountered in the data integration process: How to integrate (1) data from various sources, (2) different types of data and (3) external data with internal data. To integrate these data, the enterprise must consider and solve some technical and manageral issues. To prove our concept, three real-world applications are introduced. Knowledge can be regarded as the most valuable asset of a manufacturing enterprise. Therefore, a manufacturer enterprise should collect the data representing its processes and environments and analyze the data to build a sustainable knowledge model. First application is about generating user scenarions using online social media in the early steps of new product development (NPD) process. By strategic keyword searching, several novel user contexts are discovered from online social media. Based on contexts, domain experts can generate user scenarios for new features and functions of the target product. Second application is to construct early engine fault detection models by integrating manufacturing, inspection and after-sales service data. In most cases, production data and after-sales service data are managed independent departments, even different companies. To detect engine faults which represent customer-perceived quality, data integration is the key to generate integrated data mart. In this application, In this study, one-class classification algorithms are used due to class-imbalance problem. To address multi-dimensionality of time series data, the symbolic aggregate approximation (SAX) algorithm is used for data segmentation. Then, binary genetic algo-rithm-based wrapper approach (BGA-wrapper) is applied to segmented data to find the optimal feature subset. As a result, an anomaly score for each engine is calculated. Experimental results show that the proposed method can detect defective engines with a high probability before they are shipped. Final application is to discover knowledge from textual data in various sources. Despite many enterprises know the importance of managing key performance indicators (KPIs), most of quality activities are fulfilled according to analyzing attribute or quantitative value. It has the limitation to understand customers perspectives and exact defects. In this application, a novel active learning framework for dictionary expansion is introduced. In this framework, unsupervised natural language processing methods suitable for Koreans are applied to the data. As a result, proposed framework can construct domain-specific dictionary from almost zero-based one.	-
dc.description.tableofcontents	1. Introduction 1 1.1 Unstructured and external data analytics for new product development 3 1.2 Multi-source and external data analytics for quality management 5 1.3 Multi-source and unstructured data analytics for after-sales service 7 1.4 Multi-source, unstructured and external data 9 2. Literature Review 13 2.1 Data integration 13 2.2 Idea generation, market research and social media 16 2.3 Machine learning-based anomaly detection algorithms 18 2.3.1 Gaussian mixture model and Parzen window density estimation 18 2.3.2 Local outlier factor 18 2.3.3 k-means clustering-based anomaly detection 19 2.3.4 Principal component analysis and kernel principal component analysis-based anomaly detection 19 2.3.5 Support vector data description 19 3. Unstructured and external data analyrics for new product development 21 3.1 Background 21 3.2 Proposed framework and method 24 3.3 Case study: Smart oven 27 3.4 Creating and evaluating user scenarios based on contexts from online social media 29 3,5 Summary 30 4. Multi-source and external data analytics for quality management 33 4.1 Background 33 4.2 Methods for an early-stage engine fault detection system 37 4.2.1 Machine learning-based anomaly detection model 37 4.2.2 Binary genetic algorithm-based wrapper feature subset selection 38 4.3 Application to engine manufacturing process 40 4.3.1 Manufacturing and selling process for heavy machinery engines 41 4.3.2 Data description 41 4.3.3 Data integration 43 4.3.4 Segmentation of multi-dimensional time series data 43 4.3.5 Extracting featuers from multi-dimensional time series data 44 4.4 Machine learning-based anomaly detection models and their performance 47 4.5 Discussion 50 4.5.1 Performance of anoamly detection models and costs 50 4.5.2 Durability of anomaly detection models 51 4.5.3 Relationship between anomaly detection models and conventional SPC methods 52 4.6 Scalable local outlier factor algorithm and its application to engine fault detection 52 4.6.1 Approximate nearest neighbor search methods 53 4.6.2 Datasets 55 4.7 Summary 56 5. Multi-source and unstructured data analytics for after-sales service 59 5.1 Background 59 5.2 Proposed framework and method 60 5.2.1 Unsupervised word segmentation in Korean / English mixed documents 61 5.2.2 Active learning module for word segmentation 65 5.2.3 Post processing 66 5.2.4 Tagging 66 6. Conclusion 67 6.1 Contributions 67 6.2 Future works 69 Bibliography 70 국문초록 80	-
dc.format	application/pdf	-
dc.format.extent	5986469 bytes	-
dc.format.medium	application/pdf	-
dc.language.iso	en	-
dc.publisher	서울대학교 대학원	-
dc.subject	다중 소스 데이터	-
dc.subject	비정형 데이터	-
dc.subject	외부 데이터	-
dc.subject	기계학습	-
dc.subject	데이터 통합	-
dc.subject	제조 프로세스	-
dc.subject	신제품 개발	-
dc.subject	품질 관리	-
dc.subject.ddc	623	-
dc.title	Multi-source, unstructured and external data analytics for manufacturing process	-
dc.title.alternative	제조 프로세스를 위한 다중 소스, 비정형 및 외부 데이터 애널리틱스	-
dc.type	Thesis	-
dc.contributor.AlternativeAuthor	Taehoon Ko	-
dc.description.degree	Doctor	-
dc.citation.pages	81	-
dc.contributor.affiliation	공과대학 산업·조선공학부	-
dc.date.awarded	2017-02	-

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Industrial Engineering (산업공학과)
  - Theses (Ph.D. / Sc.D._산업공학과)

Files in This Item:

170925 원문수정 고태훈.pdf 5.22 MB

Altmetrics

Item View & Download Count

Show Simple Item Record

Find it @ SNU

트윗하기

SNS Share