Crowd Scene Analysis by Output Encoding
- URL: http://arxiv.org/abs/2001.09556v1
- Date: Mon, 27 Jan 2020 01:34:08 GMT
- Title: Crowd Scene Analysis by Output Encoding
- Authors: Yao Xue, Siming Liu, Yonghui Li, Xueming Qian
- Abstract summary: We propose a Compressed Output Sensing (CSOE) scheme, which casts detecting coordinates of small objects into a task of signal regression in encoding signal space.
CSOE helps to boost localization performance in circumstances where targets are highly crowded without huge scale variation.
We also develop an Adaptive Receptive Field Weighting (ARFW) module, which deals with scale variation issue.
- Score: 38.69524011345539
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Crowd scene analysis receives growing attention due to its wide applications.
Grasping the accurate crowd location (rather than merely crowd count) is
important for spatially identifying high-risk regions in congested scenes. In
this paper, we propose a Compressed Sensing based Output Encoding (CSOE)
scheme, which casts detecting pixel coordinates of small objects into a task of
signal regression in encoding signal space. CSOE helps to boost localization
performance in circumstances where targets are highly crowded without huge
scale variation. In addition, proper receptive field sizes are crucial for
crowd analysis due to human size variations. We create Multiple Dilated
Convolution Branches (MDCB) that offers a set of different receptive field
sizes, to improve localization accuracy when objects sizes change drastically
in an image. Also, we develop an Adaptive Receptive Field Weighting (ARFW)
module, which further deals with scale variation issue by adaptively
emphasizing informative channels that have proper receptive field size.
Experiments demonstrate the effectiveness of the proposed method, which
achieves state-of-the-art performance across four mainstream datasets,
especially achieves excellent results in highly crowded scenes. More
importantly, experiments support our insights that it is crucial to tackle
target size variation issue in crowd analysis task, and casting crowd
localization as regression in encoding signal space is quite effective for
crowd analysis.
Related papers
- Robust Zero-Shot Crowd Counting and Localization With Adaptive Resolution SAM [55.93697196726016]
We propose a simple yet effective crowd counting method by utilizing the Segment-Everything-Everywhere Model (SEEM)
We show that SEEM's performance in dense crowd scenes is limited, primarily due to the omission of many persons in high-density areas.
Our proposed method achieves the best unsupervised performance in crowd counting, while also being comparable to some supervised methods.
arXiv Detail & Related papers (2024-02-27T13:55:17Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z) - PANet: Perspective-Aware Network with Dynamic Receptive Fields and
Self-Distilling Supervision for Crowd Counting [63.84828478688975]
We propose a novel perspective-aware approach called PANet to address the perspective problem.
Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework.
The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region.
arXiv Detail & Related papers (2021-10-31T04:43:05Z) - Congested Crowd Instance Localization with Dilated Convolutional Swin
Transformer [119.72951028190586]
Crowd localization is a new computer vision task, evolved from crowd counting.
In this paper, we focus on how to achieve precise instance localization in high-density crowd scenes.
We propose a Dilated Convolutional Swin Transformer (DCST) for congested crowd scenes.
arXiv Detail & Related papers (2021-08-02T01:27:53Z) - RAMS-Trans: Recurrent Attention Multi-scale Transformer forFine-grained
Image Recognition [26.090419694326823]
localization and amplification of region attention is an important factor, which has been explored a lot by convolutional neural networks (CNNs) based approaches.
We propose the recurrent attention multi-scale transformer (RAMS-Trans) which uses the transformer's self-attention to learn discriminative region attention.
arXiv Detail & Related papers (2021-07-17T06:22:20Z) - AdaZoom: Adaptive Zoom Network for Multi-Scale Object Detection in Large
Scenes [57.969186815591186]
Detection in large-scale scenes is a challenging problem due to small objects and extreme scale variation.
We propose a novel Adaptive Zoom (AdaZoom) network as a selective magnifier with flexible shape and focal length to adaptively zoom the focus regions for object detection.
arXiv Detail & Related papers (2021-06-19T03:30:22Z) - DASGIL: Domain Adaptation for Semantic and Geometric-aware Image-based
Localization [27.294822556484345]
Long-term visual localization under changing environments is a challenging problem in autonomous driving and mobile robotics.
We propose a novel multi-task architecture to fuse the geometric and semantic information into the multi-scale latent embedding representation for visual place recognition.
arXiv Detail & Related papers (2020-10-01T17:44:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.