Machine Vision for Human Activity Recognition: Features & Algorithms

Tushar Sandhan

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Machine Vision for Human Activity Recognition: Features & Algorithms : 행동인식을 위한 머신비젼기술: 특징 및 알고리듬 연구

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: Tushar Sandhan

Advisor: Jin Young Choi

Major: 공과대학 전기·컴퓨터공학부

Issue Date: 2014-08

Publisher: 서울대학교 대학원

Keywords: Machine learning ; pattern recognition ; computer vision ; human activity recognition from video ; 행동 인식 (몸짓 비정상 이벤트) ; 특징 ; 계층적 그래프 분석 ; 근접 군집화 ; 불균형한 데이터 세트 처리

Description: 학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 8. 최진영.

Abstract: 행동 인식(Human Activity Recognition, HAR)은 군중 행동 인식, 사람간 상호작용 분석, 사람의 몸짓과 행동의 인식 등을 포괄하는 컴퓨터 비전과 기계 학습의 다면적인 주제이다. 이 분야는 영상 감시, 보안, 엔터테인먼트, 건강 관리 시스템, 동영상 인덱싱, 인간-컴퓨터 상호작용, 동영상 탐색 등의 넓은 응용 분야에서 수요가 급증하고 있으며, 지난 십 수년간 행동 인식을 연구하는 다양한 방법들이 개발되었다. 본 학위논문에서는 통합된 행동 인식 프레임워크를 만드는 새로운 강인한 특징(feature)과 알고리즘을 제안하여 기존 연구들의 문제점을 해결하고자 한다.
특징은 행동 인식에서 중요한 역할을 한다. 전반적인 특징들(global features)은 전체 동영상의 일시적 정보가 아닌 전체적이고 경향을 지닌 패턴을 수집하여 생성된다. 본 논문은 전반적 표현들(representation)과 추가적인 일시적 정보들을 혼합하여 행동 인식의 성능을 높일 수 있다는 사실에 주목하였다. 본 논문에서는 주파수와 시공간 영역에서 움직임 히스토그램(histogram)을 분석하여 새로운 중간 수준(mid-level) 특징들인 주파수그램(frequencygrams)과 공간그램(spatiograms)을 제안하고, 행동 윤곽(silhouette)의 패턴을 분석하여 새로운 높은 수준(high-level) 특징인 추상화된 라돈 프로필(abstracted radon profiles)을 제안한다. 이러한 특징들은 카메라 움직임과 작은 가려짐에 강인하고, 반복되는 움직임을 분별할 수 있는 표현(representation)을 제공하며, 또한 본 논문에서 제안된 계층적 그래프 분석 알고리즘인 그래프 피라미드(graph pyramid) 방식을 통해 교사(supervised) 행동 인식에 사용된다. 그래프 피라미드 분류기는 각각의 움직임들을 마디(node)로 표현하여 행동 분류에 대한 그래프를 생성한다. 학습 과정은 수정된 이차-카이(quadratic-Chi) 거리를 통해 그래프 모서리(edge)에 행동 분류 정보를 입력한다. 이 알고리즘은 질의(query)에 대한 주변 마디들간의 상호작용을 고려함으로써 행동 가계(family)의 숨은 세부 사항들을 드러나게 한다.
광학 흐름(optical flow)은 움직임을 묘사하는 기본적인 방식이지만, 가공되지 않은 형태에서는 배경의 잡음(noise)과 카메라의 움직임과 단위(scale) 변화에 민감하다는 단점이 있다. 하지만 가공되지 않은 데이터로부터 구성된 특징들은 행동의 근본적인 움직임을 담고 있으므로 중요하다. 본 논문에서는 픽셀 수준의 움직임 정보를 얻기 위해 광학 흐름을 이용하여 낮은 수준(low-level)의 통계적 움직임 특징들인 순환(circulation), 움직임 동질성(motion homogeneity), 움직임 방향(motion orientation)과 비유동성(stationary)을 제안한다. 이러한 특징들은 제안된 근접 군집화(proximity clustering) 알고리즘을 통해 비교사적(unsupervised) 비정상 행동인식을 수행하는 데 사용된다. 이 알고리즘의 핵심은 정상 행동들이 비정상 행동들보다 자주 나타난다는 것에 있다. 제안된 특징 공간에서 정상 행동들을 군집화하고 그 외의 것들을 비정상 행동으로 결정한다. 이 알고리즘은 근접 원칙(proximity principle)에 의해 동작하며, 정상 행동 집단의 수를 특정해줄 필요가 없다.
행동 인식 영역에서 몇몇 행동 분류들은 매우 적은 학습 데이터를 가지고 있기때문에, 데이터 세트의 균형을 재정비하지 않는다면 학습 알고리즘은 다수의 학습데이터를 갖는 분류들에 편향된 학습 결과를 얻게 된다. 따라서 불균형한 데이터 세트를 다루기 위해 본 논문에서는 소수의 분류에 대해 과다 표집(oversampling)하고 다수의 분류에 대해 과소 표집(undersampling)하여 부츠트래핑(bootstrapping)하는 G-SMOTE 알고리즘을 제안한다. G-SMOTE는 소수 과다 표집 합성 방식에서 기존의 연구에 비해 성능이 향상되었다. 이는 매우 불균형하도록 생성된 데이터 세트에서 수행된 다양한 평가에서 제안된 방법이 가장 높은 인식 결과를 도출한 것을 통해 확인할 수 있었다.
교통 상황에서, 데이터 결핍 문제를 해결하기 위해, 본 논문에서는 처음으로 시계열 임베딩(time series embedding)을 구현하였다. 본 논문에서는 다중 작업 학습 프레임워크에 의해 서로 다른 움직임 패턴간의 상관도를 활용하여 모든 행동 분류기를 학습하였다. 네 개의 공공 장소 데이터 세트에서의 실험 결과는 제안된 방식이 최신 방식들에 비해 교통 패턴 인식 성능이 우월함을 보여준다.
머신 비전(machine vision)은 어두운 조명이나, 가려짐, 혹은 카메라의 시야를 벗어나는 경우에는 성능이 크게 저하된다. 따라서 영상과 함께 음성 정보를 같이 활용하면 행동 인식 시스템의 성능을 향상시킬 수 있다. 본 논문에서는 음성 행동 인식을 위해 음성 정보의 고수준(high-level) 표현으로 음성 뱅크(audio bank)를 제안한다. 제안된 방식은 주파수-시간(frequency-temporal) 공간에서 각 음성 분류를 나타내는 분별력 있는 음성 탐지기로 구성되어 있다. 이 방식은 모든 뱅크 탐지기들의 응답을 하나의 벡터로 축적하여 저수준(low-level) 특징들에 비해 우월한 특징을 생성한다. 뱅크 크기에 따른 특징의 안정성과 높은 인식 성능은 제안된 방식의 효율성을 보여준다.
Human Activity Recognition (HAR) is a multifaceted aspect of computer vision
and machine learning, which encompasses group activity pattern discovery, inter-
personal interaction analysis, human gesture and action recognition. It has prolif-
erating demands from wide applications, such as visual surveillance and security,
entertainment, healthcare systems, video indexing, human-computer interaction
and video retrieval. So over the last decade, a diversity of approaches has been
developed to investigate the HAR. We overcome their limitations by proposing
new robust features and the algorithms to build the unified HAR framework.

Features play a vital role in HAR. Global features are generated using the
entire video sequence while ignoring explicit temporal information but they cap-
ture the oriented and holistic underlying patterns. We found that HAR can be
improved by fusing extra temporal information with global representation. So
following this vein, we propose the new mid-level features (frequencygrams and
spatiograms) by analyzing dynamics of the motion histograms in frequency and
spatio-temporal domain
and the new high-level features (abstracted radon pro-
files) by considering whole oriented information of the action silhouettes. They
are robust to camera motions and small occlusions and provide a discrimina-
tive representation for reciprocating motions. They are used for supervised HAR
through the proposed graph pyramid, a hierarchical graph analysis algorithm. In
graph pyramid classification algorithm, we construct the graph of an entire action
class by representing each motion sequence as a node. Training embeds the class
specific information in the graph edges via our modified quadratic-Chi distance.
The algorithm makes it possible to uncover the hidden subtleties of the action
family by considering interactions among the neighborhood nodes to the query.

Optical flow is the basis to describe motion sequence, however it is in raw form
may be of no use due to its susceptibility to background noise, camera motions
and scale changes. But the constructed features from raw data, encapsulate
the underlying dynamics of the activity, so they play an important role here.
Using raw optical flow we also propose the low-level statistical motion features
(viz., circulation, motion homogeneity, motion orientation and stationarity) to
readily capture the pixel level motion information. Then we use these features for
unsupervised abnormal activity recognition by the proposed proximity clustering
algorithm. The key idea behind it is that the normal events occur more frequently
than the abnormal ones. It clusters the normal events in the proposed feature
space and outliers are designated as abnormal events. It works on proximity
principal and does not require to specify the number of (normal events) clusters.

In HAR domain, some action classes have very less training examples. With-
out dataset rebalancing, the learning algorithm will encounter extremely low
minority class samples therefore it gets biased towards the majority class. Hence
properly handling the imbalanced dataset is a crucial issue. To address it, we
propose the G-SMOTE algorithm by employing bootstrapping with simultaneous
oversampling of minority class and undersampling of majority class to build the
ensemble of classifiers. G-SMOTE is an improvement to the existing synthetic
minority oversampling technique. Its extensive evaluation on several highly im-
balanced datasets has produced the highest recognition results.

In case of traffic scenario, we are the first to implement the time series embed-
ding framework to solve the data scarcity problem for traffic activity recognition.
Using multi-task learning framework, we learn all activity classifiers simulta-
neously by exploiting correlations among different motion patterns. We have
improved the traffic pattern recognition performance on all four public domain
datasets by several magnitude as compared to the state-of-the-art approaches.

Machine vision becomes blind in case of dark illumination conditions, occlu-
sions or in the areas outside the camera view. The use of audio information along
with the video can enhance the performance of the HAR system for better under-
standing of the underlying scene. So we propose the audio bank, a new high-level
representation of an audio, for audio activities recognition. It is comprised of dis-
tinctive audio detectors representing each audio class in the frequency-temporal
space. It produces superior features as compared to low-level features in discrim-
inating audio events by accumulating responses of all bank detectors into one
vector. Feature stability over the bank size and high recognition performance
using several classifiers show the effectiveness of the proposed method.

Language: English

URI: https://hdl.handle.net/10371/123081

Files in This Item:

000000020717.pdf 11.50 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Master's Degree_전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share