Related papers: Removal and Selection: Improving RGB-Infrared Object Detection via Coarse-to-Fine Fusion

Removal and Selection: Improving RGB-Infrared Object Detection via Coarse-to-Fine Fusion

URL: http://arxiv.org/abs/2401.10731v5
Date: Tue, 7 May 2024 06:52:13 GMT
Title: Removal and Selection: Improving RGB-Infrared Object Detection via Coarse-to-Fine Fusion
Authors: Tianyi Zhao, Maoxun Yuan, Feng Jiang, Nan Wang, Xingxing Wei,
Abstract summary: Most existing fusion strategies directly input RGB and IR images into deep neural networks, leading to inferior detection performance. We introduce a new coarse-to-fine perspective to purify and fuse two modality features. To verify the effectiveness of the coarse-to-fine fusion strategy, we construct a new object detector called the Removal and Selection Detector (RSDet)
Score: 20.12812979315803
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Object detection in visible (RGB) and infrared (IR) images has been widely applied in recent years. Leveraging the complementary characteristics of RGB and IR images, the object detector provides reliable and robust object localization from day to night. Most existing fusion strategies directly input RGB and IR images into deep neural networks, leading to inferior detection performance. However, the RGB and IR features have modality-specific noise, these strategies will exacerbate the fused features along with the propagation. Inspired by the mechanism of the human brain processing multimodal information, in this paper, we introduce a new coarse-to-fine perspective to purify and fuse two modality features. Specifically, following this perspective, we design a Redundant Spectrum Removal module to coarsely remove interfering information within each modality and a Dynamic Feature Selection module to finely select the desired features for feature fusion. To verify the effectiveness of the coarse-to-fine fusion strategy, we construct a new object detector called the Removal and Selection Detector (RSDet). Extensive experiments on three RGB-IR object detection datasets verify the superior performance of our method.

Related papers

LSFDNet: A Single-Stage Fusion and Detection Network for Ships Using SWIR and LWIR [16.16208006025223]
Short-wave infrared (SWIR) and long-wave infrared (LWIR) are used in ship detection.<n>We propose a novel single-stage image fusion detection algorithm called LSFDNet.<n>This algorithm leverages feature interaction between the image fusion and object detection subtask networks.<n>We validated the superiority of our proposed single-stage fusion detection algorithm on two datasets.
arXiv Detail & Related papers (2025-07-28T07:13:55Z)
Multispectral Detection Transformer with Infrared-Centric Feature Fusion [8.762314897895175]
Infrared-Centric Fusion (IC-Fusion) is a lightweight and modality-aware sensor fusion method.<n>IC-Fusion prioritizes infrared features while effectively integrating complementary RGB semantic context.<n> Experiments on the FLIR and LLVIP benchmarks demonstrate the superior effectiveness and efficiency of our IR-centric fusion strategy.
arXiv Detail & Related papers (2025-05-21T05:44:14Z)
DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion [82.2425759608975]
Infrared-visible object detection aims to achieve robust even full-day object detection by fusing the complementary information of infrared and visible images. We propose a Dynamic Adaptive Multispectral Detection Transformer (DAMSDet) to address these two challenges. Experiments on four public datasets demonstrate significant improvements compared to other state-of-the-art methods.
arXiv Detail & Related papers (2024-03-01T07:03:27Z)
Interactive Context-Aware Network for RGB-T Salient Object Detection [7.544240329265388]
We propose a novel network called Interactive Context-Aware Network (ICANet) ICANet contains three modules that can effectively perform the cross-modal and cross-scale fusions. Experiments prove that our network performs favorably against the state-of-the-art RGB-T SOD methods.
arXiv Detail & Related papers (2022-11-11T10:04:36Z)
Translation, Scale and Rotation: Cross-Modal Alignment Meets RGB-Infrared Vehicle Detection [10.460296317901662]
We find detection in aerial RGB-IR images suffers from cross-modal weakly misalignment problems. We propose a Translation-Scale-Rotation Alignment (TSRA) module to address the problem. A two-stream feature alignment detector (TSFADet) based on the TSRA module is constructed for RGB-IR object detection in aerial images.
arXiv Detail & Related papers (2022-09-28T03:06:18Z)
Mirror Complementary Transformer Network for RGB-thermal Salient Object Detection [16.64781797503128]
RGB-thermal object detection (RGB-T SOD) aims to locate the common prominent objects of an aligned visible and thermal infrared image pair. In this paper, we propose a novel mirror complementary Transformer network (MCNet) for RGB-T SOD. Experiments on benchmark and VT723 datasets show that the proposed method outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2022-07-07T20:26:09Z)
Radar Guided Dynamic Visual Attention for Resource-Efficient RGB Object Detection [10.983063391496543]
We propose a novel radar-guided spatial attention for RGB images to improve the perception quality of autonomous vehicles. Our method improves the perception of small and long range objects, which are often not detected by the object detectors in RGB mode.
arXiv Detail & Related papers (2022-06-03T18:29:55Z)
Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection [65.30079184700755]
This study addresses the issue of fusing infrared and visible images that appear differently for object detection. Previous approaches discover commons underlying the two modalities and fuse upon the common space either by iterative optimization or deep networks. This paper proposes a bilevel optimization formulation for the joint problem of fusion and detection, and then unrolls to a target-aware Dual Adversarial Learning (TarDAL) network for fusion and a commonly used detection network.
arXiv Detail & Related papers (2022-03-30T11:44:56Z)
Joint Learning of Salient Object Detection, Depth Estimation and Contour Extraction [91.43066633305662]
We propose a novel multi-task and multi-modal filtered transformer (MMFT) network for RGB-D salient object detection (SOD) Specifically, we unify three complementary tasks: depth estimation, salient object detection and contour estimation. The multi-task mechanism promotes the model to learn the task-aware features from the auxiliary tasks. Experiments show that it not only significantly surpasses the depth-based RGB-D SOD methods on multiple datasets, but also precisely predicts a high-quality depth map and salient contour at the same time.
arXiv Detail & Related papers (2022-03-09T17:20:18Z)
Multi-Scale Iterative Refinement Network for RGB-D Salient Object Detection [7.062058947498447]
salient visual cues appear in various scales and resolutions of RGB images due to semantic gaps at different feature levels. Similar salient patterns are available in cross-modal depth images as well as multi-scale versions. We devise attention based fusion module (ABF) to address on cross-modal correlation.
arXiv Detail & Related papers (2022-01-24T10:33:00Z)
Infrared Small-Dim Target Detection with Transformer under Complex Backgrounds [155.388487263872]
We propose a new infrared small-dim target detection method with the transformer. We adopt the self-attention mechanism of the transformer to learn the interaction information of image features in a larger range. We also design a feature enhancement module to learn more features of small-dim targets.
arXiv Detail & Related papers (2021-09-29T12:23:41Z)
Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection [145.4919781325014]
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection. Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods. We propose a novel mutual attention model by fusing attention and contexts from different modalities.
arXiv Detail & Related papers (2020-10-12T08:50:10Z)
Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning [59.19469551774703]
Drone-based vehicle detection aims at finding the vehicle locations and categories in an aerial image. We construct a large-scale drone-based RGB-Infrared vehicle detection dataset, termed DroneVehicle. Our DroneVehicle collects 28, 439 RGB-Infrared image pairs, covering urban roads, residential areas, parking lots, and other scenarios from day to night.
arXiv Detail & Related papers (2020-03-05T05:29:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.