Multi-modal Crowd Counting via Modal Emulation
- URL: http://arxiv.org/abs/2407.19491v1
- Date: Sun, 28 Jul 2024 13:14:57 GMT
- Title: Multi-modal Crowd Counting via Modal Emulation
- Authors: Chenhao Wang, Xiaopeng Hong, Zhiheng Ma, Yupeng Wei, Yabin Wang, Xiaopeng Fan,
- Abstract summary: We propose a modal emulation-based two-pass multi-modal crowd-counting framework.
Framework consists of two key components: a emphmulti-modal inference pass and a emphcross-modal emulation pass.
Experiments on both RGB-Thermal and RGB-Depth counting datasets demonstrate its superior performance compared to previous methods.
- Score: 41.959740205234446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal crowd counting is a crucial task that uses multi-modal cues to estimate the number of people in crowded scenes. To overcome the gap between different modalities, we propose a modal emulation-based two-pass multi-modal crowd-counting framework that enables efficient modal emulation, alignment, and fusion. The framework consists of two key components: a \emph{multi-modal inference} pass and a \emph{cross-modal emulation} pass. The former utilizes a hybrid cross-modal attention module to extract global and local information and achieve efficient multi-modal fusion. The latter uses attention prompting to coordinate different modalities and enhance multi-modal alignment. We also introduce a modality alignment module that uses an efficient modal consistency loss to align the outputs of the two passes and bridge the semantic gap between modalities. Extensive experiments on both RGB-Thermal and RGB-Depth counting datasets demonstrate its superior performance compared to previous methods. Code available at https://github.com/Mr-Monday/Multi-modal-Crowd-Counting-via-Modal-Emulation.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.