자기조직화 신경망을 이용한 지속적이고 능동적인 로봇 학습 기법

Abstract: 이 논문에서는 인공지능 로봇이 실제 환경에 적응하면서 주변에서 접하는 대상의 개념을 지속적이고 능동적으로 학습하는 방법을 제안한다. 최근 딥러닝이 비약적으로 발전하면서 인공지능 가전, 스피커 등이 개발되고 있으나, 이런 제품 대부분은 일괄적으로 학습된 음성 인식이나 얼굴 인식 같은 기능을 이용하기 때문에 개별 동작 환경이 학습 환경과 다르다면 성능이 크게 저하될 수 있다. 또한, 여기에 활용되는 딥러닝 모델은 대량의 데이터로 오랜 시간 학습시켜야 하고 입력 순서에 따라 파괴적 망각이 나타날 수 있다는 한계가 있다. 인공지능 로봇은 새로 감지한 소수의 데이터를 계속 학습해 나가는 것이 필요하며, 이 연구에서는 이를 위해 사람의 학습 방식을 모사하는 데 초점을 맞추었다. 자기조직화 신경망, 온라인 준지도 능동 학습을 비롯하여 사람의 학습 방식을 모사한 기존 머신러닝 기법을 모델 구조와 학습 기제 측면에서 분석하고 이들의 장점을 종합할 수 있는 새로운 모델인 CARLSON을 개발했다.
CARLSON은 로봇이 관측한 물체 이미지를 입력받아 물체 개념을 학습하며, 새로운 데이터를 기존 개념과 대조하면서 지식을 확장해 나가는 자기조직화 신경망 구조로 되어 있다. 물체 이미지는 차원이 높고 잡음을 포함하므로, 효율적이고 안정적인 학습을 위하여 이미지에서 핵심적인 표상을 우선 추출하도록 했다. 표상 추출은 모델의 인코더(encoder) 부분이 수행하며, 이는 표상을 이미지로 복원하는 디코더(decoder)와 함께 훈련된다. 인코더에서 추출된 표상들은 상호 유사도에 따라 여러 개념으로 나뉘고 각 개념은 대표 표상을 가지는 하나의 노드(node)로 군집화된다. 군집화 과정의 노드 추가와 조정은 적응 공명 이론(Grossberg 1987)에서처럼 하향식 예측과 상향식 활성화를 통해 이루어진다. 인코더와 디코더를 포함한 전체 모델은 데이터가 입력될 때마다 학습하며, 표지 전파 기법을 통해 유사한 노드 간에 정보를 전달하고 불확실한 개념에 대해서는 능동적 질의를 통해 정보를 보충함으로써 데이터가 적고 정답 표지가 드물 때도 효과적으로 학습할 수 있다.
이 연구에서는 실제 로봇에서 모델의 성능을 검증하기 위하여 휴머노이드 로봇인 NAO로 연속적인 물체 이미지를 수집하고 시각 객체 인식 실험을 수행했다. CARLSON은 일반적인 딥러닝 모델인 합성곱 신경망(CNN)보다 확연히 높은 분류 정확도를 보였으며, 데이터 수와 표지가 적고 각 데이터를 한 번씩만 학습할 수 있는 제약하에서도 안정적으로 동작하는 것을 검증할 수 있었다. 추가로 잘 알려진 숫자 및 물체 인식 데이터셋인 MNIST, SVHN, Fashion-MNIST, CIFAR-10에서 온라인 준지도 학습 시나리오를 설정하고 모델을 시험했으며, 마찬가지로 CARLSON이 CNN보다 높은 성능을 보이는 것을 확인했다.
In this thesis, a continual and active machine learning method is proposed to make artificial intelligence (AI) robots adapt to real environments and form concepts of nearby objects. Recent advances in the field of AI have led to the development of smart home appliances or AI speakers, but most of these products may suffer performance degradation in actual use. This is because they use functions such as voice or face recognition without adjusting them to the individual operation environments. The deep learning techniques used for these functions need to be trained repeatedly with big data for a long time, and they have a risk of catastrophic forgetting when encountering increasingly diverse objects. Meanwhile, AI robots need to continuously learn skills and concepts from a relatively small number of newly acquired data. Since humans are the most well-known agents that learn this way, imitating human learning would be one of the most effective ways to achieve the desired robot learning. The proposed model, CARLSON, integrates the strengths of the previous human-like machine learning methods.
CARLSON is a self-organizing neural network that can expand the knowledge by comparing the incoming object image to the learned concepts. In order to increase the efficiency and stability of learning, the model first reduces the size and noise of high-dimensional input images by extracting informative features, or representations, from them. The feature extraction is carried out by an encoder which is jointly trained with a decoder that reconstructs images from representations. CARLSON divides the representations into groups in such a way that each group represents a single kind of objects, or an individual concept. The groups are implemented as nodes with means and variances that are created or adjusted by considering both top-down prediction and bottom-up activation as in Adaptive Resonance Theory (Grossberg 1987). The whole model including the encoder and decoder is trained in an end-to-end manner, and updated upon every new input. Using a label propagation method, CARLSON makes the similar nodes share information so that it can infer the object categories even when the labels are provided rarely. It can also actively ask a human operator about uncertain concepts to further make up for insufficient information.
To evaluate the model, a visual object dataset was constructed by collecting images with a humanoid robot NAO, and was used for object recognition experiments. CARLSON clearly outperformed a convolutional neural network (CNN) model and showed a stable performance even when the labels were given rarely and each data could be accessed only once during training. It also performed better than CNN in online semi-supervised recognition tasks using well-known digit and object classification datasets: MNIST, SVHN, Fashion-MNIST, and CIFAR-10.

Language: kor

URI: https://hdl.handle.net/10371/175444

https://dcollection.snu.ac.kr/common/orgView/000000165454

Files in This Item:

000000165454.pdf 0.94 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Computer Science and Engineering (컴퓨터공학부)
  - Theses (Master's Degree_컴퓨터공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share