PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection
- URL: http://arxiv.org/abs/2502.14063v1
- Date: Wed, 19 Feb 2025 19:31:51 GMT
- Title: PedDet: Adaptive Spectral Optimization for Multimodal Pedestrian Detection
- Authors: Rui Zhao, Zeyu Zhang, Yi Xu, Yi Yao, Yan Huang, Wenxin Zhang, Zirui Song, Xiuying Chen, Yang Zhao,
- Abstract summary: PedDet is an adaptive spectral optimization framework specifically enhanced and optimized for multispectral pedestrian detection.
PedDet achieves state-of-the-art performance, improving the mAP by 6.6% with superior detection accuracy even in low-light conditions.
- Score: 28.06976064484559
- License:
- Abstract: Pedestrian detection in intelligent transportation systems has made significant progress but faces two critical challenges: (1) insufficient fusion of complementary information between visible and infrared spectra, particularly in complex scenarios, and (2) sensitivity to illumination changes, such as low-light or overexposed conditions, leading to degraded performance. To address these issues, we propose PedDet, an adaptive spectral optimization complementarity framework specifically enhanced and optimized for multispectral pedestrian detection. PedDet introduces the Multi-scale Spectral Feature Perception Module (MSFPM) to adaptively fuse visible and infrared features, enhancing robustness and flexibility in feature extraction. Additionally, the Illumination Robustness Feature Decoupling Module (IRFDM) improves detection stability under varying lighting by decoupling pedestrian and background features. We further design a contrastive alignment to enhance intermodal feature discrimination. Experiments on LLVIP and MSDS datasets demonstrate that PedDet achieves state-of-the-art performance, improving the mAP by 6.6% with superior detection accuracy even in low-light conditions, marking a significant step forward for road safety. Code will be available at https://github.com/AIGeeksGroup/PedDet.
Related papers
- Bringing RGB and IR Together: Hierarchical Multi-Modal Enhancement for Robust Transmission Line Detection [67.02804741856512]
We propose a novel Hierarchical Multi-Modal Enhancement Network (HMMEN) that integrates RGB and IR data for robust and accurate TL detection.
Our method introduces two key components: (1) a Mutual Multi-Modal Enhanced Block (MMEB), which fuses and enhances hierarchical RGB and IR feature maps in a coarse-to-fine manner, and (2) a Feature Alignment Block (FAB) that corrects misalignments between decoder outputs and IR feature maps by leveraging deformable convolutions.
arXiv Detail & Related papers (2025-01-25T06:21:06Z) - Beyond Night Visibility: Adaptive Multi-Scale Fusion of Infrared and
Visible Images [49.75771095302775]
We propose an Adaptive Multi-scale Fusion network (AMFusion) with infrared and visible images.
First, we separately fuse spatial and semantic features from infrared and visible images, where the former are used for the adjustment of light distribution.
Second, we utilize detection features extracted by a pre-trained backbone that guide the fusion of semantic features.
Third, we propose a new illumination loss to constrain fusion image with normal light intensity.
arXiv Detail & Related papers (2024-03-02T03:52:07Z) - Frequency Domain Modality-invariant Feature Learning for
Visible-infrared Person Re-Identification [79.9402521412239]
We propose a novel Frequency Domain modality-invariant feature learning framework (FDMNet) to reduce modality discrepancy from the frequency domain perspective.
Our framework introduces two novel modules, namely the Instance-Adaptive Amplitude Filter (IAF) and the Phrase-Preserving Normalization (PPNorm)
arXiv Detail & Related papers (2024-01-03T17:11:27Z) - TFDet: Target-Aware Fusion for RGB-T Pedestrian Detection [21.04812985569116]
We propose a novel target-aware fusion strategy for multispectral pedestrian detection, named TFDet.
TFDet achieves state-of-the-art performance on two multispectral pedestrian benchmarks, KAIST and LLVIP.
arXiv Detail & Related papers (2023-05-26T02:09:48Z) - Flare-Aware Cross-modal Enhancement Network for Multi-spectral Vehicle
Re-identification [29.48387524901101]
In harsh environments, the discnative cues in RGB and NIR modalities are often lost due to strong flares from vehicle lamps or sunlight.
We propose a Flare-Aware Cross-modal Enhancement Network that adaptively restores flare-corrupted RGB and NIR features with guidance from the flareimmunized thermal infrared spectrum.
arXiv Detail & Related papers (2023-05-23T04:04:24Z) - Low-Light Hyperspectral Image Enhancement [90.84144276935464]
This work focuses on the low-light HSI enhancement task, which aims to reveal the spatial-spectral information hidden in darkened areas.
Based on Laplacian pyramid decomposition and reconstruction, we developed an end-to-end data-driven low-light HSI enhancement (HSIE) approach.
The effectiveness and efficiency of HSIE both in quantitative assessment measures and visual effects are demonstrated.
arXiv Detail & Related papers (2022-08-05T08:45:52Z) - Cross-Modality Attentive Feature Fusion for Object Detection in
Multispectral Remote Sensing Imagery [0.6853165736531939]
Cross-modality fusing complementary information of multispectral remote sensing image pairs can improve the perception ability of detection algorithms.
We propose a novel and lightweight multispectral feature fusion approach with joint common-modality and differential-modality attentions.
Our proposed approach can achieve the state-of-the-art performance at a low cost.
arXiv Detail & Related papers (2021-12-06T13:12:36Z) - BAANet: Learning Bi-directional Adaptive Attention Gates for
Multispectral Pedestrian Detection [14.672188805059744]
This work proposes an effective and efficient cross-modality fusion module called Bi-directional Adaptive Gate (BAA-Gate)
Based on the attention mechanism, the BAA-Gate is devised to distill the informative features and recalibrate the representationsally.
Considerable experiments on the challenging KAIST dataset demonstrate the superior performance of our method with satisfactory speed.
arXiv Detail & Related papers (2021-12-04T08:30:54Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - Improving Multispectral Pedestrian Detection by Addressing Modality
Imbalance Problems [12.806496583571858]
Multispectral pedestrian detection can adapt to insufficient illumination conditions by leveraging color-thermal modalities.
Compared with traditional pedestrian detection, we find multispectral pedestrian detection suffers from modality imbalance problems.
We propose Modality Balance Network (MBNet) which facilitates the optimization process in a much more flexible and balanced manner.
arXiv Detail & Related papers (2020-08-07T08:58:46Z) - RGB-D Salient Object Detection with Cross-Modality Modulation and
Selection [126.4462739820643]
We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD)
The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features.
arXiv Detail & Related papers (2020-07-14T14:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.