Related papers: Wavelet-guided Misalignment-aware Network for Visible-Infrared Object Detection

Wavelet-guided Misalignment-aware Network for Visible-Infrared Object Detection

URL: http://arxiv.org/abs/2507.20146v1
Date: Sun, 27 Jul 2025 06:53:31 GMT
Title: Wavelet-guided Misalignment-aware Network for Visible-Infrared Object Detection
Authors: Haote Zhang, Lipeng Gu, Wuzhou Quan, Fu Lee Wang, Honghui Fan, Jiali Tang, Dingkun Zhu, Haoran Xie, Xiaoping Zhang, Mingqiang Wei,
Abstract summary: We propose the Wavelet-guided Misalignment-aware Network (WMNet) to adaptively address different cross-modal misalignment patterns.<n>By jointly exploiting low and high-frequency information, WMNet alleviates the adverse effects of noise, illumination variation, and spatial misalignment.<n>It enhances the representation of salient target features while suppressing spurious or misleading information, thereby promoting more accurate and robust detection.
Score: 21.634585158312763
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Visible-infrared object detection aims to enhance the detection robustness by exploiting the complementary information of visible and infrared image pairs. However, its performance is often limited by frequent misalignments caused by resolution disparities, spatial displacements, and modality inconsistencies. To address this issue, we propose the Wavelet-guided Misalignment-aware Network (WMNet), a unified framework designed to adaptively address different cross-modal misalignment patterns. WMNet incorporates wavelet-based multi-frequency analysis and modality-aware fusion mechanisms to improve the alignment and integration of cross-modal features. By jointly exploiting low and high-frequency information and introducing adaptive guidance across modalities, WMNet alleviates the adverse effects of noise, illumination variation, and spatial misalignment. Furthermore, it enhances the representation of salient target features while suppressing spurious or misleading information, thereby promoting more accurate and robust detection. Extensive evaluations on the DVTOD, DroneVehicle, and M3FD datasets demonstrate that WMNet achieves state-of-the-art performance on misaligned cross-modal object detection tasks, confirming its effectiveness and practical applicability.

Related papers

MDAFNet: Multiscale Differential Edge and Adaptive Frequency Guided Network for Infrared Small Target Detection [5.434562114399152]
Infrared small target detection plays a crucial role in numerous military and civilian applications.<n>Existing methods often face the gradual degradation of target edge pixels as the number of network layers increases.<n>We propose MDAFNet, which integrates the Multi-Scale Differential Edge (MSDE) module and Dual-Domain Adaptive Feature Enhancement (DAFE) module.
arXiv Detail & Related papers (2026-01-23T04:16:16Z)
Variational Dual-path Attention Network for CSI-Based Gesture Recognition [0.0]
Wi-Fi gesture recognition based on Channel State Information (CSI) is challenged by high-dimensional noise and resource constraints on edge devices.<n>This paper proposes a lightweight feature preprocessing module--the Variational Dual-path Attention Network (VDAN)<n>It performs structured feature refinement through frequency-domain filtering and temporal detection.
arXiv Detail & Related papers (2026-01-20T09:02:02Z)
Graph-Based Uncertainty Modeling and Multimodal Fusion for Salient Object Detection [12.743278093269325]
We propose a dynamic uncertainty propagation and multimodal collaborative reasoning network (DUP-MCRNet)<n>DUGC is designed to propagate uncertainty between layers through a sparse graph constructed based on spatial semantic distance.<n>MCF uses learnable modality gating weights to weightedly fuse the attention maps of RGB, depth, and edge features.
arXiv Detail & Related papers (2025-08-28T04:31:48Z)
AuxDet: Auxiliary Metadata Matters for Omni-Domain Infrared Small Target Detection [58.67129770371016]
We propose a novel IRSTD framework that reimagines the IRSTD paradigm by incorporating textual metadata for scene-aware optimization.<n>AuxDet consistently outperforms state-of-the-art methods, validating the critical role of auxiliary information in improving robustness and accuracy.
arXiv Detail & Related papers (2025-05-21T07:02:05Z)
ARFC-WAHNet: Adaptive Receptive Field Convolution and Wavelet-Attentive Hierarchical Network for Infrared Small Target Detection [2.643590634429843]
ARFC-WAHNet is an adaptive receptive field convolution and wavelet-attentive hierarchical network for infrared small target detection.<n>ARFC-WAHNet outperforms recent state-of-the-art methods in both detection accuracy and robustness.
arXiv Detail & Related papers (2025-05-15T09:44:23Z)
AVadCLIP: Audio-Visual Collaboration for Robust Video Anomaly Detection [57.649223695021114]
We present a novel weakly supervised framework that leverages audio-visual collaboration for robust video anomaly detection.<n>Our framework demonstrates superior performance across multiple benchmarks, with audio integration significantly boosting anomaly detection accuracy.
arXiv Detail & Related papers (2025-04-06T13:59:16Z)
MSCA-Net:Multi-Scale Context Aggregation Network for Infrared Small Target Detection [0.1759252234439348]
This paper proposes a network architecture named MSCA-Net, which integrates three key components.<n>MSEDA employs a multi-scale feature fusion attention mechanism to adaptively aggregate information across different scales.<n>PCBAM captures the correlation between global and local features through a correlation matrix-based strategy.<n> CAB enhances the representation of critical features by assigning greater weights to them, integrating both low-level and high-level information.
arXiv Detail & Related papers (2025-03-21T14:42:31Z)
Adaptive Illumination-Invariant Synergistic Feature Integration in a Stratified Granular Framework for Visible-Infrared Re-Identification [18.221111822542024]
Visible-Infrared Person Re-Identification (VI-ReID) plays a crucial role in applications such as search and rescue, infrastructure protection, and nighttime surveillance.<n>We propose textbfAMINet, an Adaptive Modality Interaction Network.<n>AMINet employs multi-granularity feature extraction to capture comprehensive identity attributes from both full-body and upper-body images.
arXiv Detail & Related papers (2025-02-28T15:42:58Z)
Evaluating ML Robustness in GNSS Interference Classification, Characterization & Localization [42.14439854721613]
Jamming devices disrupt signals from the global navigation satellite system (GNSS)<n>This paper introduces an extensive dataset comprising snapshots obtained from a low-frequency antenna.<n>Our objective is to assess the resilience of machine learning (ML) models against environmental changes.
arXiv Detail & Related papers (2024-09-23T15:20:33Z)
DPDETR: Decoupled Position Detection Transformer for Infrared-Visible Object Detection [42.70285733630796]
Infrared-visible object detection aims to achieve robust object detection by leveraging the complementary information of infrared and visible image pairs. fusing misalignment complementary features is difficult, and current methods cannot accurately locate objects in both modalities under misalignment conditions. We propose a Decoupled Position Detection Transformer to address these problems. Experiments on DroneVehicle and KAIST datasets demonstrate significant improvements compared to other state-of-the-art methods.
arXiv Detail & Related papers (2024-08-12T13:05:43Z)
Wavelet-based Bi-dimensional Aggregation Network for SAR Image Change Detection [53.842568573251214]
Experimental results on three SAR datasets demonstrate that our WBANet significantly outperforms contemporary state-of-the-art methods. Our WBANet achieves 98.33%, 96.65%, and 96.62% of percentage of correct classification (PCC) on the respective datasets.
arXiv Detail & Related papers (2024-07-18T04:36:10Z)
Cross-Modal Object Tracking via Modality-Aware Fusion Network and A Large-Scale Dataset [20.729414075628814]
We propose an adaptive cross-modal object tracking algorithm called Modality-Aware Fusion Network (MAFNet) MAFNet efficiently integrates information from both RGB and NIR modalities using an adaptive weighting mechanism.
arXiv Detail & Related papers (2023-12-22T05:22:33Z)
Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection [95.84616822805664]
We introduce CNNs-assisted Transformer architecture and propose a novel RGB-D SOD network with Point-aware Interaction and CNN-induced Refinement.<n>In order to alleviate the block effect and detail destruction problems brought by the Transformer naturally, we design a CNN-induced refinement (CNNR) unit for content refinement and supplementation.
arXiv Detail & Related papers (2023-08-17T11:57:49Z)
Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain.<n>Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage.<n>Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z)
Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection [145.4919781325014]
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection. Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods. We propose a novel mutual attention model by fusing attention and contexts from different modalities.
arXiv Detail & Related papers (2020-10-12T08:50:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.