Cascaded information enhancement and cross-modal attention feature
fusion for multispectral pedestrian detection
- URL: http://arxiv.org/abs/2302.08670v1
- Date: Fri, 17 Feb 2023 03:30:00 GMT
- Title: Cascaded information enhancement and cross-modal attention feature
fusion for multispectral pedestrian detection
- Authors: Yang Yang, Kaixiong Xu, Kaizheng Wang
- Abstract summary: We propose a multispectral pedestrian detection algorithm, which mainly consists of a cascaded information enhancement module and a cross-modal attention feature fusion module.
Our method demonstrates a lower pedestrian miss rate and more accurate pedestrian detection boxes compared to the comparison method.
- Score: 6.167053377021009
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multispectral pedestrian detection is a technology designed to detect and
locate pedestrians in Color and Thermal images, which has been widely used in
automatic driving, video surveillance, etc. So far most available multispectral
pedestrian detection algorithms only achieved limited success in pedestrian
detection because of the lacking take into account the confusion of pedestrian
information and background noise in Color and Thermal images. Here we propose a
multispectral pedestrian detection algorithm, which mainly consists of a
cascaded information enhancement module and a cross-modal attention feature
fusion module. On the one hand, the cascaded information enhancement module
adopts the channel and spatial attention mechanism to perform attention
weighting on the features fused by the cascaded feature fusion block. Moreover,
it multiplies the single-modal features with the attention weight element by
element to enhance the pedestrian features in the single-modal and thus
suppress the interference from the background. On the other hand, the
cross-modal attention feature fusion module mines the features of both Color
and Thermal modalities to complement each other, then the global features are
constructed by adding the cross-modal complemented features element by element,
which are attentionally weighted to achieve the effective fusion of the two
modal features. Finally, the fused features are input into the detection head
to detect and locate pedestrians. Extensive experiments have been performed on
two improved versions of annotations (sanitized annotations and paired
annotations) of the public dataset KAIST. The experimental results show that
our method demonstrates a lower pedestrian miss rate and more accurate
pedestrian detection boxes compared to the comparison method. Additionally, the
ablation experiment also proved the effectiveness of each module designed in
this paper.
Related papers
- Transferring Modality-Aware Pedestrian Attentive Learning for
Visible-Infrared Person Re-identification [43.05147831905626]
We propose a novel Transferring Modality-Aware Pedestrian Attentive Learning (TMPA) model.
TMPA focuses on the pedestrian regions to efficiently compensate for missing modality-specific features.
experiments conducted on the benchmark SYSU-MM01 and RegDB datasets demonstrated the effectiveness of our proposed TMPA model.
arXiv Detail & Related papers (2023-12-12T07:15:17Z) - ReDFeat: Recoupling Detection and Description for Multimodal Feature
Learning [51.07496081296863]
We recouple independent constraints of detection and description of multimodal feature learning with a mutual weighting strategy.
We propose a detector that possesses a large receptive field and is equipped with learnable non-maximum suppression layers.
We build a benchmark that contains cross visible, infrared, near-infrared and synthetic aperture radar image pairs for evaluating the performance of features in feature matching and image registration tasks.
arXiv Detail & Related papers (2022-05-16T04:24:22Z) - Target-aware Dual Adversarial Learning and a Multi-scenario
Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection.
Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks.
This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z) - Cross-Modality Attentive Feature Fusion for Object Detection in
Multispectral Remote Sensing Imagery [0.6853165736531939]
Cross-modality fusing complementary information of multispectral remote sensing image pairs can improve the perception ability of detection algorithms.
We propose a novel and lightweight multispectral feature fusion approach with joint common-modality and differential-modality attentions.
Our proposed approach can achieve the state-of-the-art performance at a low cost.
arXiv Detail & Related papers (2021-12-06T13:12:36Z) - Generalizing Face Forgery Detection with High-frequency Features [63.33397573649408]
Current CNN-based detectors tend to overfit to method-specific color textures and thus fail to generalize.
We propose to utilize the high-frequency noises for face forgery detection.
The first is the multi-scale high-frequency feature extraction module that extracts high-frequency noises at multiple scales.
The second is the residual-guided spatial attention module that guides the low-level RGB feature extractor to concentrate more on forgery traces from a new perspective.
arXiv Detail & Related papers (2021-03-23T08:19:21Z) - Mutual-Supervised Feature Modulation Network for Occluded Pedestrian
Detection [10.497367073305806]
We propose a novel Mutual-Supervised Feature Modulation (MSFM) network to better handle occluded pedestrian detection.
MSFM module calculates the similarity loss of full body boxes and visible body boxes corresponding to the same pedestrian.
Our approach achieves superior performance compared to other state-of-the-art methods on two challenging pedestrian datasets.
arXiv Detail & Related papers (2020-10-21T03:42:22Z) - From Handcrafted to Deep Features for Pedestrian Detection: A Survey [148.35460817092908]
Pedestrian detection is an important but challenging problem in computer vision.
Over the past decade, significant improvement has been witnessed with the help of handcrafted features and deep features.
In addition to single-spectral pedestrian detection, we also review multi-spectral pedestrian detection.
arXiv Detail & Related papers (2020-10-01T14:51:10Z) - Anchor-free Small-scale Multispectral Pedestrian Detection [88.7497134369344]
We propose a method for effective and efficient multispectral fusion of the two modalities in an adapted single-stage anchor-free base architecture.
We aim at learning pedestrian representations based on object center and scale rather than direct bounding box predictions.
Results show our method's effectiveness in detecting small-scaled pedestrians.
arXiv Detail & Related papers (2020-08-19T13:13:01Z) - Dense Scene Multiple Object Tracking with Box-Plane Matching [73.54369833671772]
Multiple Object Tracking (MOT) is an important task in computer vision.
We propose the Box-Plane Matching (BPM) method to improve the MOT performacne in dense scenes.
With the effectiveness of the three modules, our team achieves the 1st place on the Track-1 leaderboard in the ACM MM Grand Challenge HiEve 2020.
arXiv Detail & Related papers (2020-07-30T16:39:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.