Automatic Multi-Label Image Classification Model for Construction Site Images

Abstract: 최근 이미지 분석 기술이 발전함에 따라 건설 현장에서 다양한 방면에서 현장에서 수집된 사진을 활용하여 건설 프로젝트를 관리하고자 하는 시도가 이루어지고 있다. 특히 촬영 장비의 발전되자 건설 현장에서 생산되는 사진의 수가 급증하여 건설 현장 사진의 잠재적인 활용도는 더욱 더 높아지고 있다. 하지만 이렇게 생산되는 많은 양의 사진은 대부분 제대로 분류되지 않은 상태로 보관되고 있기 때문에 현장 사진으로부터 필요한 프로젝트 정보를 추출하는 것은 매우 어려운 실정이다. 현재 현장에서 사진을 분류하는 방식은 사용자가 직접 개별 사진을 검토한 뒤 분류하기 때문에 많은 시간과 노력이 요구되고, 이미지 분류를 위한 특징을 직접적으로 추출하는 기존의 이미지 분석 기술 역시 복잡한 건설 현장 사진의 특징을 범용적으로 학습하는 데는 한계가 있다.
이에 본 연구에서는 건설 현장 사진의 모습이 매우 다양하고, 동적으로 변하는 것에 대응하기 위해 이미지 분류에서 높은 성능을 보이고 합성곱 신경망(Deep Convolutional Neural Network) 알고리즘을 적용하여 개별 건설 현장 사진에 적합한 키워드를 자동으로 할당할 수 있는 모델을 개발하고자 한다. 합성곱 신경망 모델은 모델 구조가 깊어짐에 따라 높은 차원의 항상성(invariant) 특징도 효과적으로 학습할 수 있는 특징이 있기 때문에 복잡한 건설 현장 사진 분류 문제에 적합하다.
따라서 본 연구에서는 합성곱 신경망 모델을 토대로 현장에서 필요한 사진을 빠르고 정확하게 찾을 수 있도록 각 사진에 적합한 키워드를 자동으로 할당하는 모델을 개발하였다. 특히, 건설 현장 사진의 대부분이 하나 이상의 레이블과 연관이 있다는 점에 기반하여 다중 레이블 분류 모델을 적용하였다. 이를 통해 일차적으로는 건설 사진에서 프로젝트와 관련된 다양한 정보를 추출하여 건설 현장 사진의 활용도를 개선하고, 나아가 사진 데이터를 활용하여 효율적인 건설 관리를 도모하고자 한다.
본 연구의 진행 순서는 다음과 같다. 우선 모델을 학습시키기 위해서 실제 건설 현장 및 오픈소스 검색엔진을 통하여 총 6개 공종의 사진을 수집하고, 하위 분류 범위를 포함한 총 10개 레이블의 데이터셋을 구성하여 학습을 진행했다. 또한 구체적인 모델 선택을 위해 대표적인 합성곱 신경망 모델을 비교 검토하여 가장 우수한 성능을 보인 ResNet 18을 최종 모델로 선택했다. 실험 결과 평균 91%의 정확도를 보이며 건설 현장 사진을 자동으로 분류할 수 있는 가능성을 확인하였다.
또한 본 연구는 최근 타 분야 이미지 분석에서 좋은 성과를 보인 합성곱 신경망을 활용하여 건설 현장 사진을 자동으로 분류할 수 있다는 가능성을 확인했다는 점과, 건설 현장 사진 분류 문제에 다중 레이블 분류를 적용한 첫 연구라는 점에서 의의가 있다. 실제 현장에서는 사진을 자동으로 분류할 수 있게 됨에 따라 기존에 번거로운 수동 사진 분류 작업을 줄이고, 건설 현장 사진의 활용도를 높일 수 있을 것으로 기대된다.
하지만 본 연구는 각 레이블 간에 연관성이나 의존성을 고려하지 않기 때문에 추후 연구에서는 각 사진 간의 계층적 관계를 모델에 추가적으로 학습시켜 정확도를 높이고, 학습 레이블도 더 낮은 단계의 키워드까지 포함하여 현장 사진으로부터 보다 다양한 정보를 얻을 수 있도록 모델을 개선하는 것을 목표로 하고 있다.
Activity recognition in construction performs as the prerequisite step in the process for various tasks and thus is critical for successful project management. In the last several years, the computer vision community has blossomed, taking advantage of the exploding amount of construction images and deploying the visual analytics technology for cumbersome construction tasks. However, the current annotation practice itself, which is a critical preliminary step for prompt image retrieval and image understanding, is remained as both time-consuming and labor-intensive. Because previous attempts to make the process more efficient were inappropriate to handle dynamic nature of construction images and showed limited performance in classifying construction activities, this research aims to develop a model which is not only robust to a wide range of appearances but also multi-composition of construction activity images. The proposed model adopts a deep convolutional neural network model to learn high dimensional feature with less human-engineering and annotate multi-labels of semantic information in the images. The result showed that our model was capable of distinguishing different trades of activities at different stages of the activity. The average accuracy of 83% and a maximum accuracy of 91% holds promise in an actual implementation of automated activity recognition for construction operations. Ultimately, it demonstrated a potential method to provide automated and reliable procedure to monitor construction activity.

Language: eng

URI: https://hdl.handle.net/10371/160985

http://dcollection.snu.ac.kr/common/orgView/000000157045

Files in This Item:

000000157045.pdf 5.24 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Architecture and Architectural Engineering (건축학과)
  - Theses (Master's Degree_건축학과)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share