Improving Multispectral Pedestrian Detection by Addressing Modality
Imbalance Problems
- URL: http://arxiv.org/abs/2008.03043v2
- Date: Mon, 17 Aug 2020 02:21:34 GMT
- Title: Improving Multispectral Pedestrian Detection by Addressing Modality
Imbalance Problems
- Authors: Kailai Zhou, Linsen Chen, Xun Cao
- Abstract summary: Multispectral pedestrian detection can adapt to insufficient illumination conditions by leveraging color-thermal modalities.
Compared with traditional pedestrian detection, we find multispectral pedestrian detection suffers from modality imbalance problems.
We propose Modality Balance Network (MBNet) which facilitates the optimization process in a much more flexible and balanced manner.
- Score: 12.806496583571858
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multispectral pedestrian detection is capable of adapting to insufficient
illumination conditions by leveraging color-thermal modalities. On the other
hand, it is still lacking of in-depth insights on how to fuse the two
modalities effectively. Compared with traditional pedestrian detection, we find
multispectral pedestrian detection suffers from modality imbalance problems
which will hinder the optimization process of dual-modality network and depress
the performance of detector. Inspired by this observation, we propose Modality
Balance Network (MBNet) which facilitates the optimization process in a much
more flexible and balanced manner. Firstly, we design a novel Differential
Modality Aware Fusion (DMAF) module to make the two modalities complement each
other. Secondly, an illumination aware feature alignment module selects
complementary features according to the illumination conditions and aligns the
two modality features adaptively. Extensive experimental results demonstrate
MBNet outperforms the state-of-the-arts on both the challenging KAIST and
CVC-14 multispectral pedestrian datasets in terms of the accuracy and the
computational efficiency. Code is available at
https://github.com/CalayZhou/MBNet.
Related papers
- AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection [23.91870504363899]
Double-stream networks in multispectral detection employ two separate feature extraction branches for multi-modal data.
This has hindered the widespread employment of multispectral pedestrian detection in embedded devices for autonomous systems.
We introduce the Adaptive Modal Fusion Distillation (AMFD) framework, which can fully utilize the original modal features of the teacher network.
arXiv Detail & Related papers (2024-05-21T17:17:17Z) - Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Flexible-modal Deception Detection with Audio-Visual Adapter [20.6514221670249]
We propose a novel framework to fuse temporal features across two modalities efficiently.
Experiments conducted on two benchmark datasets demonstrate that the proposed method can achieve superior performance.
arXiv Detail & Related papers (2023-02-11T15:47:20Z) - MS-DETR: Multispectral Pedestrian Detection Transformer with Loosely Coupled Fusion and Modality-Balanced Optimization [43.04788370184486]
misalignment and modality imbalance are the most significant issues in multispectral pedestrian detection.
MS-DETR consists of two modality-specific backbones and Transformer encoders, followed by a multi-modal Transformer decoder.
Our end-to-end MS-DETR shows superior performance on the challenging KAIST, CVC-14 and LLVIP benchmark datasets.
arXiv Detail & Related papers (2023-02-01T07:45:10Z) - PSNet: Parallel Symmetric Network for Video Salient Object Detection [85.94443548452729]
We propose a VSOD network with up and down parallel symmetry, named PSNet.
Two parallel branches with different dominant modalities are set to achieve complete video saliency decoding.
arXiv Detail & Related papers (2022-10-12T04:11:48Z) - Bi-Bimodal Modality Fusion for Correlation-Controlled Multimodal
Sentiment Analysis [96.46952672172021]
Bi-Bimodal Fusion Network (BBFN) is a novel end-to-end network that performs fusion on pairwise modality representations.
Model takes two bimodal pairs as input due to known information imbalance among modalities.
arXiv Detail & Related papers (2021-07-28T23:33:42Z) - Optimization-driven Machine Learning for Intelligent Reflecting Surfaces
Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts.
Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity.
In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.