S-Space College of Engineering/Engineering Practice School (공과대학/대학원) Dept. of Industrial Engineering (산업공학과) Theses (Ph.D. / Sc.D._산업공학과)
Multi-source, unstructured and external data analytics for manufacturing process : 제조 프로세스를 위한 다중 소스, 비정형 및 외부 데이터 애널리틱스
- 공과대학 산업·조선공학부
- Issue Date
- 서울대학교 대학원
- 학위논문 (박사)-- 서울대학교 대학원 : 산업·조선공학부, 2017. 2. 조성준.
- Data integration means the task of combining data with various types residing at different sources, and providing the user with a unified view of these data. In this thesis, we consider the data integration as the process of creating data marts to be used as input to the machine learning and data mining models in a view of data analyzer and miners. Actually, three types of problems are encountered in the data integration process: How to integrate (1) data from various sources, (2) different types of data and (3) external data with internal data. To integrate these data, the enterprise must consider and solve some technical and manageral issues. To prove our concept, three real-world applications are introduced. Knowledge can be regarded as the most valuable asset of a manufacturing enterprise. Therefore, a manufacturer enterprise should collect the data representing its processes and environments and analyze the data to build a sustainable knowledge model.
First application is about generating user scenarions using online social media in the early steps of new product development (NPD) process. By strategic keyword searching, several novel user contexts are discovered from online social media. Based on contexts, domain experts can generate user scenarios for new features and functions of the target product.
Second application is to construct early engine fault detection models by integrating manufacturing, inspection and after-sales service data. In most cases, production data and after-sales service data are managed independent departments, even different companies. To detect engine faults which represent customer-perceived quality, data integration is the key to generate integrated data mart. In this application, In this study, one-class classification algorithms are used due to class-imbalance problem. To address multi-dimensionality of time series data, the symbolic aggregate approximation (SAX) algorithm is used for data segmentation. Then, binary genetic algo-rithm-based wrapper approach (BGA-wrapper) is applied to segmented data to find the optimal feature subset. As a result, an anomaly score for each engine is calculated. Experimental results show that the proposed method can detect defective engines with a high probability before they are shipped.
Final application is to discover knowledge from textual data in various sources. Despite many enterprises know the importance of managing key performance indicators (KPIs), most of quality activities are fulfilled according to analyzing attribute or quantitative value. It has the limitation to understand customers perspectives and exact defects. In this application, a novel active learning framework for dictionary expansion is introduced. In this framework, unsupervised natural language processing methods suitable for Koreans are applied to the data. As a result, proposed framework can construct domain-specific dictionary from almost zero-based one.