영상 기반 실내 위치 추정을 위한 심층 합성곱 신경망에 관한 연구

김광중

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

영상 기반 실내 위치 추정을 위한 심층 합성곱 신경망에 관한 연구 : A Study on Image Based Deep Convolutional Neural Network for Indoor Positioning : Method of Data Refinement with Omnidirectional Image and Comparison of Transfer Learning Methodologies
전방향 영상 데이터 정제 및 전이학습 방법론 비교

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 김광중

Advisor: 김용일

Issue Date: 2019-08

Publisher: 서울대학교 대학원

Keywords: 실내 위치 추정 ; 심층 합성곱 신경망 ; 전방향 영상 ; 데이터 정제

Description: 학위논문(석사)--서울대학교 대학원 :공과대학 건설환경공학부,2019. 8. 김용일.

Abstract: Location-based services can be used for the introduction of Internet of Things (IoT) in various industries, the spread of high-speed communication networks, real-time location tracking and traffic information, location-based search and advertising and marketing. In location-based services, location estimation technology plays an essential role as a basic data base for providing appropriate services. Therefore, accurate positioning technology is needed to improve the quality of location based services. However, global satellite navigation systems, which are commonly used position estimation techniques, are relatively inaccurate at indoor space and require different technologies. Among such technologies, the method of estimating the indoor position using the sensor mounted on the smartphone is superior in terms of accessibility due to the spread of the smartphone. In particular, the method of classifying the images acquired by the smartphone camera and estimating the position is economical because it requires no additional equipment installation, and is independent on the internet, so that it can operate stably. Recently, the image classification system using the convolutional neural network has been greatly developed, and the indoor positioning system using the convolutional neural network has received attention.
However, in order to fully train the convolution neural network with good performance from the scratch, a large amount of training data, a calculation resource and a calculation time are required. Therefore, in most cases, transfer learning which uses pre-trained convolution neural networks have been used. The following factors should be considered in order to learn the convolutional neural network using transfer learning.
First, we need to determine the type of deep convolution neural network to use for transfer learning. Since AlexNet developed, deep convolutional neural networks developed in terms of efficiency and accuracy using several techniques. Therefore, it is necessary to compare the performance of the convolutional neural networks in indoor positioning for image classification. Next, selecting the transfer learning type is needed. Transfer learning can be classified into four types according to the degree of similarity between target data and pre-training data and the number of target data. Therefore, it is necessary to set the transfer learning type and to compare the performance between the set transfer learning types in using the deep convolutional neural network. Even using the transfer learning, the average number of training images per zone was more than 1,000 in the previous studies. Therefore, it is necessary to devise a method for efficiently acquiring a large amount of images. Finally, a method of removing data that leads to poor accuracy should be studied. When acquiring images in large quantities, images that adversely affect neural network learning may be included. Research to remove them should be backed up in the way of obtaining a large amount of images.
This study compared the performances of types of neural networks and types of transfer learning in indoor positioning by image classification through deep convolutional neural network. And I proposed a method of using omnidirectional image to efficiently acquire a large amount of images. In addition, I proposed a method to remove images that adversely affected on training when omnidirectional images were used.
In this study, I used AlexNet, MobilNet V2, and Inception-ResNet V2 among the exist neural networks. The above-mentioned neural network was known to be a neural network for image classification of ImageNet, which is an image database used in ILSVRC (ImageNet Large Scale Visual Recognition Competition), and it is known that ImageNet image classification accuracy is higher in order of Inception-ResNet V2, MobileNet V2 and AlexNet.
Since the experiments in this study were performed in the virtual space, I explained how to construct the virtual space for the experiment first. Based on the assumption of the actual situation, the common points and characteristics were given to each place, and the orientation and the path of the omnidirectional camera were set.
In this paper, I proposed a method to generate multiple pinhole camera models from an omnidirectional image in which color information is mapped to a unit sphere, and then divide them into perspective projection images. In the proposed method, 30 perspective images were generated from one omnidirectional image.
Using the generated perspective projection images, AlexNet, MobilNet V2, and Inception-ResNet V2 transfer learned from the pre-trained from ImageNet images. In order to analyze for each type of transfer learning, the training was carried out in six cases by learning the type 1 which only trains the fully connected layer for each neural network, and the type 3 which trains the part of the convolutional layers partly.
Data removal was performed using entropy to calculate the amount of information in the image and neural network learned with the image not removed. The entropy criterion of the image to be removed was increased until the frequency of the right image classification becomes larger than the frequency of the misclassification.
As a result of the experiment, it was confirmed that using the neural network with high image classification accuracy of ImageNet is advantageous for the indoor positioning in terms of accuracy, because the indoor positioning accuracy is high in order of the image classification accuracy of ImageNet. In addition, the accuracy was 6.12% higher in transfer learning type 3 than in transfer learning type 1, so that type 3 is more advantageous for indoor positioning than type 1. Also, I confirmed that the neural network learned after data removal improved the average accuracy by 1.99% compared with the neural network learned by the data that was not removed.
As a result of this study, it is more advantageous to use the advanced neural network than to use the neural network initially developed and to train the layer for feature extraction than to learn only the fully connected layer. In addition, the proposed method of generating a large-scale perspective projection image using the omnidirectional image is worthy of being able to solve the problem of data quantity shortage in the learning of the convolutional neural network. And the method of eliminating the image which adversely affects the learning can be used universally for the method of obtaining a large amount of images. The results of this study can be applied to the indoor positioning studies using the convolutional neural network and the image based convolutional neural network using the transfer learning can be applied as one of the means of indoor positioning.
위치 기반 서비스 산업은 다양한 산업에서의 IoT(Internet of Things), 고속 통신 네트워크 확산, 실시간 위치 추적 및 교통정보, 위치 기반 검색 및 광고·마케팅 등으로 그 규모와 중요성이 증대되고 있다. 위치 기반 서비스에서 위치 추정 기술은 알맞은 서비스 제공을 위한 기본 자료로써 필수적인 역할을 한다. 따라서 위치 기반 서비스의 품질 향상을 위해 정확한 위치 추정 기술의 필요성이 대두되고 있으나, 범용적으로 사용되는 위치 추정 기술인 위성 항법 장치는 실내에서 비교적 정확도가 낮아 다른 형태의 기술들이 요구된다. 이중에서, 스마트폰의 보급으로 스마트폰에 탑재된 센서들을 이용한 실내 위치 추정 방법이 범용성면에서 뛰어나고, 특히 탑재된 카메라로 촬영한 영상을 분류하여 위치를 추정하는 방식은 추가적인 장비 설치가 필요없어 경제적이며, 통신에 독립적이기 때문에 안정적으로 작동할 수 있다는 강점을 가진다. 또한 최근 합성곱 신경망을 이용한 영상의 분류 기술이 큰 발전을 이루어 이를 이용한 실내 위치 추정 시스템이 주목받고 있다.
하지만 좋은 성능을 발휘하는 합성곱 신경망을 온전히 처음부터 학습시키기 위해서는 방대한 학습 데이터의 수량, 계산 자원과 계산 시간이 필요하다. 따라서 대부분의 경우 사전 학습된 합성곱 신경망을 사용하는 전이학습을 사용한다. 전이학습을 이용하여 합성곱 신경망을 학습하기 위해서는 다음과 같은 요인을 고려해야 한다.
먼저 전이학습에 사용할 심층 합성곱 신경망의 종류를 결정해야 한다. 2012년 AlexNet이 출현한 이후, 여러 기법을 활용하여 효율성과 정확도면에서 발전된 다수의 심층 합성곱 신경망이 개발되었기 때문에, 전이학습에 이용할 합성곱 신경망간의 실내 위치 추정 성능을 비교하는 연구가 필요하다. 다음으로 전이학습 유형을 선택해야한다. 전이학습은 사전 학습 데이터와 대상 데이터간의 유사도, 대상 데이터 수량에 따라 네 가지 유형으로 나눌 수 있다. 그러므로 심층 합성곱 신경망의 전이학습 시, 전이학습 유형을 설정하고, 설정한 전이학습 유형 간 성능을 비교하여야 한다. 또한 대량의 데이터를 얻는 방법을 고려해야 한다. 전이학습을 이용하여도 선행 연구들에서 구역별 학습 영상의 수는 평균 1,000장 이상이 사용된 바 있다. 그러므로 대량의 영상을 효율적으로 취득하는 방법을 고안할 필요가 있다. 마지막으로 정확도 저하를 유발하는 데이터를 제거하는 방법이 연구되어야 한다. 대량으로 영상 취득 시, 신경망 학습에 악영향을 끼치는 영상이 포함될 수 있다. 이들을 제거하는 데이터 정제가 대량의 영상을 얻는 방법에 뒷받침 되어야한다.
본 연구는 심층 합성곱 신경망을 통한 영상 분류로 실내 위치 추정 시, 심층 합성곱 신경망의 종류와 전이학습 유형에 따른 성능을 비교하고, 대량의 영상을 효율적으로 취득하기 위해 전방향 영상을 이용하는 방법을 제안한다. 또한 전방향 영상 사용 시, 학습에 악영향을 끼치는 영상을 제거하는 데이터 정제를 제안한다.
본 연구에서는 개발된 신경망 중 AlexNet, MobilNet V2, 그리고 Inception-ResNet V2를 실험에 사용하였다. 위 신경망은 모두 ILSVRC(ImageNet Large Scale Visual Recognition Competition)라는 대회에서 사용된 영상 데이터베이스인 ImageNet의 영상 분류를 위한 신경망으로, Inception-ResNet V2, MobileNet V2, AlexNet 순으로 ImageNet의 영상 분류 정확도가 높다고 알려져 있다(Mathwork, 2019).
본 연구의 실험은 가상공간에서 실행되었으므로, 실험을 위한 가상공간 구축방법을 설명하였다. 현실의 상황에 가정하여 장소별로 공통점과 특징을 부여하였고, 전방향 카메라의 자세와 경로를 설정하였다. 단위 구(unit sphere)에 색상정보가 매핑 된 전방향 영상에서 다수의 핀홀 카메라 모델을 생성하여 원근 투영 영상으로 분할하고, 변환하는 방법을 제안하였다. 제안한 방법으로 1장의 전방향 영상에서 30장의 원근투영 영상을 생성하였다.
생성된 원근 투영 영상을 이용하여 ImageNet의 영상으로 사전 학습된 AlexNet, MobilNet V2, 그리고 Inception-ResNet V2를 학습시켰다. 이때 전이학습 유형별 분석을 위해 각각의 신경망에 대해 전역 연결 계층만 학습시키는 유형1과 합성곱 계층도 부분적으로 학습시키는 유형3으로 경우를 나누어 학습시켜 총 여섯 경우로 학습을 진행하였다.
데이터 정제는 영상의 정보량을 계산하기 위한 엔트로피와 정제되지 않은 영상으로 학습된 신경망을 이용하였다. 엔트로피가 낮은 순으로 정렬하고 영상이 정분류 된 빈도가 오분류 된 빈도보다 커질 때 까지 제거할 영상의 엔트로피 기준을 증가시켰다.
실험 결과 ImageNet의 영상 분류 정확도가 높은 순서대로 실내 위치 추정 정확도가 높았기 때문에 ImageNet의 영상 분류 정확도가 높은 신경망을 사용하는 것이 정확도 면에서 실내 위치 추정에 유리함을 확인하였다. 또한 정확도는 모든 신경망에서 전이학습 유형1보다 유형3으로 학습시킨 경우가 평균적으로 6.12% 높았으므로 유형3이 유형1보다 실내 위치 추정에 유리함을 확인하였다. 마지막으로, 데이터 정제 후 학습시킨 신경망은 정제하지 않은 데이터로 학습시킨 신경망보다 평균 1.99%의 정확도가 향상되었다.
본 연구의 결과로 초기에 개발된 신경망을 사용하기보다는 발전된 신경망을 이용하고, 그리고 전역 연결층만을 학습시키기보다는 특징 추출을 위한 계층을 부분적으로 학습시키는 것이 실내 위치 추정에서 더 정확한 결과를 기대할 수 있음을 실험적으로 확인했다는 점에서 의의가 있다. 또한 제안된 전방향 영상을 이용하여 대량의 원근 투영 영상을 생성하는 방법은 합성곱 신경망의 학습에서 데이터 수량 부족 문제를 해소할 수 있다는 점에서 가치가 있다. 그리고 학습에 비효율적인 영상을 제거하는 데이터 정제는 대량의 영상을 얻는 방법에 범용적으로 활용될 수 있다. 본 연구의 결과는 합성곱 신경망을 이용한 실내 위치 추정 연구들에서 활용되고, 전이학습을 이용한 영상 기반 합성곱 신경망은 실내 위치 추정의 수단 중 하나로 적용이 가능할 것으로 판단된다.

Language: kor

URI: https://hdl.handle.net/10371/160981

http://dcollection.snu.ac.kr/common/orgView/000000156686

Files in This Item:

000000156686.pdf 22.42 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Civil & Environmental Engineering (건설환경공학부)
  - Theses (Master's Degree_건설환경공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share