DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with
Competitive Query Selection and Adaptive Feature Fusion
- URL: http://arxiv.org/abs/2403.00326v3
- Date: Thu, 7 Mar 2024 11:08:16 GMT
- Title: DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with
Competitive Query Selection and Adaptive Feature Fusion
- Authors: Junjie Guo, Chenqiang Gao, Fangcen Liu, Deyu Meng and Xinbo Gao
- Abstract summary: Infrared-visible object detection aims to achieve robust even full-day object detection by fusing the complementary information of infrared and visible images.
We propose a Dynamic Adaptive Multispectral Detection Transformer (DAMSDet) to address these two challenges.
Experiments on four public datasets demonstrate significant improvements compared to other state-of-the-art methods.
- Score: 82.2425759608975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Infrared-visible object detection aims to achieve robust even full-day object
detection by fusing the complementary information of infrared and visible
images. However, highly dynamically variable complementary characteristics and
commonly existing modality misalignment make the fusion of complementary
information difficult. In this paper, we propose a Dynamic Adaptive
Multispectral Detection Transformer (DAMSDet) to simultaneously address these
two challenges. Specifically, we propose a Modality Competitive Query Selection
strategy to provide useful prior information. This strategy can dynamically
select basic salient modality feature representation for each object. To
effectively mine the complementary information and adapt to misalignment
situations, we propose a Multispectral Deformable Cross-attention module to
adaptively sample and aggregate multi-semantic level features of infrared and
visible images for each object. In addition, we further adopt the cascade
structure of DETR to better mine complementary information. Experiments on four
public datasets of different scenes demonstrate significant improvements
compared to other state-of-the-art methods. The code will be released at
https://github.com/gjj45/DAMSDet.
Related papers
- DPDETR: Decoupled Position Detection Transformer for Infrared-Visible Object Detection [42.70285733630796]
Infrared-visible object detection aims to achieve robust object detection by leveraging the complementary information of infrared and visible image pairs.
fusing misalignment complementary features is difficult, and current methods cannot accurately locate objects in both modalities under misalignment conditions.
We propose a Decoupled Position Detection Transformer to address these problems.
Experiments on DroneVehicle and KAIST datasets demonstrate significant improvements compared to other state-of-the-art methods.
arXiv Detail & Related papers (2024-08-12T13:05:43Z) - PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion is a framework to fuse information of RGB images and LiDAR point clouds at the points of interest (PoIs)
Our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and computation.
We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach.
arXiv Detail & Related papers (2024-03-14T09:28:12Z) - Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images [1.662438436885552]
Multi-modal fusion has been determined to enhance the accuracy by fusing data from multiple modalities.
We propose a novel multi-modal fusion strategy for mapping relationships between different channels at the early stage.
By addressing fusion in the early stage, as opposed to mid or late-stage methods, our method achieves competitive and even superior performance compared to existing techniques.
arXiv Detail & Related papers (2023-10-21T00:56:11Z) - Adaptive Rotated Convolution for Rotated Object Detection [96.94590550217718]
We present Adaptive Rotated Convolution (ARC) module to handle rotated object detection problem.
In our ARC module, the convolution kernels rotate adaptively to extract object features with varying orientations in different images.
The proposed approach achieves state-of-the-art performance on the DOTA dataset with 81.77% mAP.
arXiv Detail & Related papers (2023-03-14T11:53:12Z) - Weakly Aligned Feature Fusion for Multimodal Object Detection [52.15436349488198]
multimodal data often suffer from the position shift problem, i.e., the image pair is not strictly aligned.
This problem makes it difficult to fuse multimodal features and puzzles the convolutional neural network (CNN) training.
In this article, we propose a general multimodal detector named aligned region CNN (AR-CNN) to tackle the position shift problem.
arXiv Detail & Related papers (2022-04-21T02:35:23Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - Cross-Modality Fusion Transformer for Multispectral Object Detection [0.0]
Multispectral image pairs can provide the combined information, making object detection applications more reliable and robust.
We present a simple yet effective cross-modality feature fusion approach, named Cross-Modality Fusion Transformer (CFT) in this paper.
arXiv Detail & Related papers (2021-10-30T15:34:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.