BAANet: Learning Bi-directional Adaptive Attention Gates for
Multispectral Pedestrian Detection
- URL: http://arxiv.org/abs/2112.02277v1
- Date: Sat, 4 Dec 2021 08:30:54 GMT
- Title: BAANet: Learning Bi-directional Adaptive Attention Gates for
Multispectral Pedestrian Detection
- Authors: Xiaoxiao Yang, Yeqian Qiang, Huijie Zhu, Chunxiang Wang, Ming Yang
- Abstract summary: This work proposes an effective and efficient cross-modality fusion module called Bi-directional Adaptive Gate (BAA-Gate)
Based on the attention mechanism, the BAA-Gate is devised to distill the informative features and recalibrate the representationsally.
Considerable experiments on the challenging KAIST dataset demonstrate the superior performance of our method with satisfactory speed.
- Score: 14.672188805059744
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Thermal infrared (TIR) image has proven effectiveness in providing
temperature cues to the RGB features for multispectral pedestrian detection.
Most existing methods directly inject the TIR modality into the RGB-based
framework or simply ensemble the results of two modalities. This, however,
could lead to inferior detection performance, as the RGB and TIR features
generally have modality-specific noise, which might worsen the features along
with the propagation of the network. Therefore, this work proposes an effective
and efficient cross-modality fusion module called Bi-directional Adaptive
Attention Gate (BAA-Gate). Based on the attention mechanism, the BAA-Gate is
devised to distill the informative features and recalibrate the representations
asymptotically. Concretely, a bi-direction multi-stage fusion strategy is
adopted to progressively optimize features of two modalities and retain their
specificity during the propagation. Moreover, an adaptive interaction of
BAA-Gate is introduced by the illumination-based weighting strategy to
adaptively adjust the recalibrating and aggregating strength in the BAA-Gate
and enhance the robustness towards illumination changes. Considerable
experiments on the challenging KAIST dataset demonstrate the superior
performance of our method with satisfactory speed.
Related papers
- Multi-Modal Hybrid Learning and Sequential Training for RGB-T Saliency
Detection [10.589062261564631]
RGB-T saliency detection has emerged as an important computer vision task, identifying conspicuous objects in challenging scenes such as dark environments.
Existing methods neglect the characteristics of cross-modal features and rely solely on network structures to fuse RGB and thermal features.
We first propose a Multi-Modal Hybrid loss (MMHL) that comprises supervised and self-supervised loss functions.
arXiv Detail & Related papers (2023-09-13T20:47:29Z) - RGB-T Tracking Based on Mixed Attention [5.151994214135177]
RGB-T tracking involves the use of images from both visible and thermal modalities.
An RGB-T tracker based on mixed attention mechanism to achieve complementary fusion of modalities is proposed in this paper.
arXiv Detail & Related papers (2023-04-09T15:59:41Z) - Decomposed Cross-modal Distillation for RGB-based Temporal Action
Detection [23.48709176879878]
Temporal action detection aims to predict the time intervals and the classes of action instances in the video.
Existing two-stream models exhibit slow inference speed due to their reliance on computationally expensive optical flow.
We introduce a cross-modal distillation framework to build a strong RGB-based detector by transferring knowledge of the motion modality.
arXiv Detail & Related papers (2023-03-30T10:47:26Z) - R2FD2: Fast and Robust Matching of Multimodal Remote Sensing Image via
Repeatable Feature Detector and Rotation-invariant Feature Descriptor [3.395266574804949]
We propose a novel feature matching method (named R2FD2) that is robust to radiation and rotation differences.
The proposed R2FD2 outperforms five state-of-the-art feature matching methods, and has superior advantages in universality and adaptability.
Our R2FD2 achieves the accuracy of matching within two pixels and has a great advantage in matching efficiency over other state-of-the-art methods.
arXiv Detail & Related papers (2022-12-05T13:55:02Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - DUT-LFSaliency: Versatile Dataset and Light Field-to-RGB Saliency
Detection [104.50425501764806]
We introduce a large-scale dataset to enable versatile applications for light field saliency detection.
We present an asymmetrical two-stream model consisting of the Focal stream and RGB stream.
Experiments demonstrate that our Focal stream achieves state-of-the-arts performance.
arXiv Detail & Related papers (2020-12-30T11:53:27Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - RGB-D Salient Object Detection with Cross-Modality Modulation and
Selection [126.4462739820643]
We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD)
The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features.
arXiv Detail & Related papers (2020-07-14T14:22:50Z) - Optimization-driven Deep Reinforcement Learning for Robust Beamforming
in IRS-assisted Wireless Communications [54.610318402371185]
Intelligent reflecting surface (IRS) is a promising technology to assist downlink information transmissions from a multi-antenna access point (AP) to a receiver.
We minimize the AP's transmit power by a joint optimization of the AP's active beamforming and the IRS's passive beamforming.
We propose a deep reinforcement learning (DRL) approach that can adapt the beamforming strategies from past experiences.
arXiv Detail & Related papers (2020-05-25T01:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.