Congestion-scale-aware Design of Network Structure and Training Strategy for Crowd Density Estimation

정지엽

서울대학교 중앙도서관

S-Space 소개

My S-Space

로그인이 필요합니다.

S-Space

Publications

Detailed Information

Congestion-scale-aware Design of Network Structure and Training Strategy for Crowd Density Estimation : 군중 밀도 예측을 위한 네트워크 구조와 훈련방법의 혼잡도 및 크기 인식 설계

Cited 0 time in Web of Science Cited 0 time in Scopus

Export

Authors: 정지엽

Advisor: 최진영

Issue Date: 2022

Publisher: 서울대학교 대학원

Keywords: crowd density estimation ; crowd counting ; scene understanding ; visual surveillance

Description: 학위논문(박사) -- 서울대학교대학원 : 공과대학 전기·정보공학부, 2022.2. 최진영.

Abstract: This dissertation presents novel deep learning-based crowd density estimation methods considering the crowd congestion and scale of people. Crowd density estimation is one of the important tasks for the intelligent surveillance system. Using the crowd density estimation, the region of interest for public security and safety can be easily indicated. It can also help advanced computer vision algorithms that are computationally expensive, such as pedestrian detection and tracking.
After the introduction of deep learning to the crowd density estimation, most researches follow the conventional scheme that uses a convolutional neural network to learn the network to estimate crowd density map with training images. The deep learning-based crowd density estimation researches can consist of two perspectives; network structure perspective and training strategy perspective. In general, researches of network structure perspective propose a novel network structure to extract features to represent crowd well. On the other hand, those of the training strategy perspective propose a novel training methodology or a loss function to improve the counting performance.
In this dissertation, I propose several works in both perspectives in deep learning-based crowd density estimation. In particular, I design the network models to be had rich crowd representation characteristics according to the crowd congestion and the scale of people. I propose two novel network structures: selective ensemble network and cascade residual dilated network. Also, I propose one novel loss function for the crowd density estimation: congestion-aware Bayesian loss.
First, I propose a selective ensemble deep network architecture for crowd density estimation. In contrast to existing deep network-based methods, the proposed method incorporates two sub-networks for local density estimation: one to learn sparse density regions and one to learn dense density regions. Locally estimated density maps from the two sub-networks are selectively combined in an ensemble fashion using a gating network to estimate an initial crowd density map. The initial density map is refined as a high-resolution map, using another sub-network that draws on contextual information in the image. In training, a novel adaptive loss scheme is applied to resolve ambiguity in the crowded region. The proposed scheme improves both density map accuracy and counting accuracy by adjusting the weighting value between density loss and counting loss according to the degree of crowdness and training epochs.
Second, I propose a novel crowd density estimation architecture, which is composed of multiple dilated convolutional neural network blocks with different scales. The proposed architecture is motivated by an empirical analysis that small-scale dilated convolution well estimates the center area density of each person, whereas large-scale dilated convolution well estimates the periphery area density of a person. To estimate the crowd density map gradually from the center to the periphery of each person in a crowd, the multiple dilated CNN blocks are trained in cascading from the small dilated CNN block to the large one.
Third, I propose a novel congestion-aware Bayesian loss method that considers the person-scale and crowd-sparsity. Deep learning-based crowd density estimation can greatly improve the accuracy of crowd counting. Though a Bayesian loss method resolves the two problems of the need of a hand-crafted ground truth (GT) density and noisy annotations, counting accurately in high-congested scenes remains a challenging issue. In a crowd scene, people's appearances change according to the scale of each individual (i.e., the person-scale). Also, the lower the sparsity of a local region (i.e., the crowd-sparsity), the more difficult it is to estimate the crowd density. I estimate the person-scale based on scene geometry, and I then estimate the crowd-sparsity using the estimated person-scale. The estimated person-scale and crowd-sparsity are utilized in the novel congestion-aware Bayesian loss method to improve the supervising representation of the point annotations.
The effectiveness of the proposed density estimators is validated through comparative experiments with state-of-the-art methods on widely-used crowd counting benchmark datasets. The proposed methods are achieved superior performance to the state-of-the-art density estimators on diverse surveillance environments. In addition, for all proposed crowd density estimation methods, the efficiency of each component is verified through several ablation experiments.
본 학위논문에서는 군중의 혼잡도와 사람의 크기를 고려한 딥러닝 기반의 새로운 군중 밀도 추정 방법을 제시합니다. 군중 밀도 추정은 지능형 감시 시스템의 중요한 과제들 중 하나입니다. 군중 밀도 추정을 사용하여 공공 보안 및 안전에 대한 관심 영역을 쉽게 표시할 수 있습니다. 또한 이를 이용하면 보행자 감지, 추적 등 연산 부담이 높은 고급 컴퓨터 비전 알고리즘이 지능형 감시 시스템에 효과적으로 적용하는 것을 도울 수 있습니다.
군중 밀도 추정에 딥 러닝이 도입된 후 대부분의 연구는 훈련 이미지로 군중 밀도 맵을 추정하는 네트워크를 학습하기 위해 컨볼루션 신경망을 사용하는 관습적인 방식을 따릅니다. 딥 러닝 기반 군중 밀도 추정 연구는 네트워크 구조 관점과 훈련 전략 관점의 두 가지 관점으로 나뉠 수 있습니다. 일반적으로 네트워크 구조 관점의 연구에서는 군중을 잘 표현하기 위한 특징을 추출하기 위한 새로운 네트워크 구조를 제안합니다. 반면 훈련 전략 관점에서는 계수 성능을 향상시키기 위해 새로운 훈련 방법론이나 손실 함수를 제안합니다.
본 학위논문에서는 딥러닝 기반 군중밀도 추정에서 두 가지 관점에서 여러 연구를 제안합니다. 특히, 각 사람의 군중 혼잡도와 규모에 따라 풍부한 군중 표현 특성을 갖도록 제안하는 모델을 설계합니다. 선택적 앙상블 네트워크와 계단식 잔여 확장 네트워크의 두 가지 새로운 네트워크 구조를 제안합니다. 또한 군중 밀도 추정을 위한 새로운 손실 함수인 혼잡 인식 베이지안 손실을 제안합니다.
먼저, 정확한 군중밀도 추정과 인원 계수를 위한 선택적 앙상블 딥 네트워크 구조를 제안합니다. 기존 딥 네트워크 기반 방법과 달리 제안된 방법은 지역 밀도 추정을 위해 두 개의 하위 네트워크를 통합합니다. 하나는 희소 밀도 영역 학습용이고 다른 하나는 밀집 밀도 영역 학습용입니다. 두 개의 하위 네트워크에서 지역적으로 추정된 밀도맵은 초기 군중밀도로 추정되며 게이팅 네트워크를 사용하여 앙상블 방식으로 선택적으로 결합됩니다. 초기 밀도맵은 이미지의 컨텍스트 정보를 기반으로 하는 또 다른 하위 네트워크를 사용하여 고해상도 맵으로 개선됩니다. 네트워크 훈련에서 새로운 적응형 손실 체계를 적용하여 혼잡한 지역의 모호성을 해결합니다. 제안된 기법은 밀집도 및 훈련 정도에 따라 밀도 손실과 계수 손실 사이의 가중치를 조정하여 밀도맵 정확도와 계수 정확도를 모두 향상시킵니다.
두 번째로, 스케일이 다른 다중 확장 컨볼루션 블록으로 구성된 새로운 군중밀도 추정 네트워크 구조를 제안합니다. 제안된 네트워크 구조는 소규모 확장 컨볼루션은 각 사람의 중심 영역 밀도를 정확히 추정하는 반면 대규모 확장 컨볼루션은 사람의 주변 영역 밀도를 잘 추정한다는 경험적 분석에서 비롯되었습니다. 군중에 있는 각 사람의 중심에서 주변으로 점차적으로 군중밀도맵을 추정하기 위해 여러 확장된 컨볼루션 블록이 작은 확장 컨볼루션 블록에서 큰 블록으로 계단식으로 훈련됩니다.
마지막으로, 사람 규모와 군중 희소성을 고려한 새로운 혼잡 인식 베이지안 손실 방법을 제안합니다. 딥 러닝 기반 군중 밀도 추정은 군중 계산의 정확도를 크게 향상시킬 수 있습니다. 베이지안 손실 방법은 손으로 만든 지상 진실 밀도와 잡음이 있는 주석의 필요성이라는 두 가지 문제를 해결하지만 혼잡한 장면에서 정확하게 계산하는 것은 여전히 어려운 문제입니다. 군중 장면에서 사람의 외모는 각 사람의 크기('사람 크기')에 따라 바뀝니다. 또한 국부 영역의 희소성('군중 희소성')이 낮을수록 군중 밀도를 추정하기가 더 어렵습니다. 장면 기하정보를 기반으로 '사람 크기'를 추정한 다음 추정된 '사람 크기'를 사용하여 '군중 희소성'을 추정합니다. 추정된 '사람 크기' 및 '군중 희소성'은 새로운 혼잡 인식 베이지안 손실 방법에서 사용되어 점 주석의 교사 표현을 개선합니다.
제안된 밀도 추정기의 효율성은 널리 사용되는 군중 계산 벤치마크 데이터 세트에 대한 최첨단 방법과의 비교 실험을 통해 검증되었습니다. 제안된 방법은 다양한 감시 환경에서 최첨단 밀도 추정기보다 우수한 성능을 달성했습니다. 또한 제안된 모든 군중 밀도 추정 방법에 대해 여러 자가비교 실험을 통해 각 구성 요소의 효율성을 검증했습니다.

Language: eng

URI: https://hdl.handle.net/10371/181103

https://dcollection.snu.ac.kr/common/orgView/000000170463

Files in This Item:

000000170463.pdf 12.24 MB

Appears in Collections:

College of Engineering/Engineering Practice School (공과대학/대학원)
- Dept. of Electrical and Computer Engineering (전기·정보공학부)
  - Theses (Ph.D. / Sc.D._전기·정보공학부)

Altmetrics

Item View & Download Count

Show Full Item Record

Find it @ SNU

트윗하기

SNS Share