Translation, Scale and Rotation: Cross-Modal Alignment Meets
RGB-Infrared Vehicle Detection
- URL: http://arxiv.org/abs/2209.13801v1
- Date: Wed, 28 Sep 2022 03:06:18 GMT
- Title: Translation, Scale and Rotation: Cross-Modal Alignment Meets
RGB-Infrared Vehicle Detection
- Authors: Maoxun Yuan, Yinyan Wang, Xingxing Wei
- Abstract summary: We find detection in aerial RGB-IR images suffers from cross-modal weakly misalignment problems.
We propose a Translation-Scale-Rotation Alignment (TSRA) module to address the problem.
A two-stream feature alignment detector (TSFADet) based on the TSRA module is constructed for RGB-IR object detection in aerial images.
- Score: 10.460296317901662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Integrating multispectral data in object detection, especially visible and
infrared images, has received great attention in recent years. Since visible
(RGB) and infrared (IR) images can provide complementary information to handle
light variations, the paired images are used in many fields, such as
multispectral pedestrian detection, RGB-IR crowd counting and RGB-IR salient
object detection. Compared with natural RGB-IR images, we find detection in
aerial RGB-IR images suffers from cross-modal weakly misalignment problems,
which are manifested in the position, size and angle deviations of the same
object. In this paper, we mainly address the challenge of cross-modal weakly
misalignment in aerial RGB-IR images. Specifically, we firstly explain and
analyze the cause of the weakly misalignment problem. Then, we propose a
Translation-Scale-Rotation Alignment (TSRA) module to address the problem by
calibrating the feature maps from these two modalities. The module predicts the
deviation between two modality objects through an alignment process and
utilizes Modality-Selection (MS) strategy to improve the performance of
alignment. Finally, a two-stream feature alignment detector (TSFADet) based on
the TSRA module is constructed for RGB-IR object detection in aerial images.
With comprehensive experiments on the public DroneVehicle datasets, we verify
that our method reduces the effect of the cross-modal misalignment and achieve
robust detection results.
Related papers
- The Solution for the GAIIC2024 RGB-TIR object detection Challenge [5.625794757504552]
RGB-TIR object detection aims to utilize both RGB and TIR images for complementary information during detection.
Our proposed method achieved an mAP score of 0.516 and 0.543 on A and B benchmarks respectively.
arXiv Detail & Related papers (2024-07-04T12:08:36Z) - Removal then Selection: A Coarse-to-Fine Fusion Perspective for RGB-Infrared Object Detection [20.12812979315803]
Object detection utilizing both visible (RGB) and thermal infrared (IR) imagery has garnered extensive attention.
Most existing multi-modal object detection methods directly input the RGB and IR images into deep neural networks.
We propose a novel coarse-to-fine perspective to purify and fuse features from both modalities.
arXiv Detail & Related papers (2024-01-19T14:49:42Z) - $\mathbf{C}^2$Former: Calibrated and Complementary Transformer for
RGB-Infrared Object Detection [18.27510863075184]
We propose a novel Calibrated and Complementary Transformer called $mathrmC2$Former to address modality miscalibration and imprecision problems.
Because $mathrmC2$Former performs in the feature domain, it can be embedded into existed RGB-IR object detectors via the backbone network.
arXiv Detail & Related papers (2023-06-28T12:52:48Z) - Symmetric Uncertainty-Aware Feature Transmission for Depth
Super-Resolution [52.582632746409665]
We propose a novel Symmetric Uncertainty-aware Feature Transmission (SUFT) for color-guided DSR.
Our method achieves superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-01T06:35:59Z) - CIR-Net: Cross-modality Interaction and Refinement for RGB-D Salient
Object Detection [144.66411561224507]
We present a convolutional neural network (CNN) model, named CIR-Net, based on the novel cross-modality interaction and refinement.
Our network outperforms the state-of-the-art saliency detectors both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-10-06T11:59:19Z) - Mirror Complementary Transformer Network for RGB-thermal Salient Object
Detection [16.64781797503128]
RGB-thermal object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair.
In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD.
Experiments on benchmark and VT723 datasets show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-07T20:26:09Z) - Multi-Scale Iterative Refinement Network for RGB-D Salient Object
Detection [7.062058947498447]
salient visual cues appear in various scales and resolutions of RGB images due to semantic gaps at different feature levels.
Similar salient patterns are available in cross-modal depth images as well as multi-scale versions.
We devise attention based fusion module (ABF) to address on cross-modal correlation.
arXiv Detail & Related papers (2022-01-24T10:33:00Z) - Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images [89.81919625224103]
Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images.
We present a Dual-Semi RGB-D Salient Object Detection Network (DS-Net) to leverage unlabeled RGB images for boosting RGB-D saliency detection.
arXiv Detail & Related papers (2022-01-01T03:02:27Z) - Bi-directional Cross-Modality Feature Propagation with
Separation-and-Aggregation Gate for RGB-D Semantic Segmentation [59.94819184452694]
Depth information has proven to be a useful cue in the semantic segmentation of RGBD images for providing a geometric counterpart to the RGB representation.
Most existing works simply assume that depth measurements are accurate and well-aligned with the RGB pixels and models the problem as a cross-modal feature fusion.
In this paper, we propose a unified and efficient Crossmodality Guided to not only effectively recalibrate RGB feature responses, but also to distill accurate depth information via multiple stages and aggregate the two recalibrated representations alternatively.
arXiv Detail & Related papers (2020-07-17T18:35:24Z) - Cross-Modal Weighting Network for RGB-D Salient Object Detection [76.0965123893641]
We propose a novel Cross-Modal Weighting (CMW) strategy to encourage comprehensive interactions between RGB and depth channels for RGB-D SOD.
Specifically, three RGB-depth interaction modules, named CMW-L, CMW-M and CMW-H, are developed to deal with respectively low-, middle- and high-level cross-modal information fusion.
CMWNet consistently outperforms 15 state-of-the-art RGB-D SOD methods on seven popular benchmarks.
arXiv Detail & Related papers (2020-07-09T16:01:44Z) - Drone-based RGB-Infrared Cross-Modality Vehicle Detection via
Uncertainty-Aware Learning [59.19469551774703]
Drone-based vehicle detection aims at finding the vehicle locations and categories in an aerial image.
We construct a large-scale drone-based RGB-Infrared vehicle detection dataset, termed DroneVehicle.
Our DroneVehicle collects 28, 439 RGB-Infrared image pairs, covering urban roads, residential areas, parking lots, and other scenarios from day to night.
arXiv Detail & Related papers (2020-03-05T05:29:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.