Related papers: Pyramidal Adaptive Cross-Gating for Multimodal Detection

Pyramidal Adaptive Cross-Gating for Multimodal Detection

URL: http://arxiv.org/abs/2512.18291v1
Date: Sat, 20 Dec 2025 09:32:18 GMT
Title: Pyramidal Adaptive Cross-Gating for Multimodal Detection
Authors: Zidong Gu, Shoufu Tian,
Abstract summary: PACGNet is an architecture designed to perform deep fusion within the backbone.<n>The P module reconstructs the feature hierarchy via a progressive hierarchical gating mechanism.<n>Our PACGNet sets a new state-of-the-art benchmark, with mAP50 scores reaching 81.7% and 82.1% respectively.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Object detection in aerial imagery is a critical task in applications such as UAV reconnaissance. Although existing methods have extensively explored feature interaction between different modalities, they commonly rely on simple fusion strategies for feature aggregation. This introduces two critical flaws: it is prone to cross-modal noise and disrupts the hierarchical structure of the feature pyramid, thereby impairing the fine-grained detection of small objects. To address this challenge, we propose the Pyramidal Adaptive Cross-Gating Network (PACGNet), an architecture designed to perform deep fusion within the backbone. To this end, we design two core components: the Symmetrical Cross-Gating (SCG) module and the Pyramidal Feature-aware Multimodal Gating (PFMG) module. The SCG module employs a bidirectional, symmetrical "horizontal" gating mechanism to selectively absorb complementary information, suppress noise, and preserve the semantic integrity of each modality. The PFMG module reconstructs the feature hierarchy via a progressive hierarchical gating mechanism. This leverages the detailed features from a preceding, higher-resolution level to guide the fusion at the current, lower-resolution level, effectively preserving fine-grained details as features propagate. Through evaluations conducted on the DroneVehicle and VEDAI datasets, our PACGNet sets a new state-of-the-art benchmark, with mAP50 scores reaching 81.7% and 82.1% respectively.

Related papers

Assisted Refinement Network Based on Channel Information Interaction for Camouflaged and Salient Object Detection [30.393796834241794]
Camouflaged Object Detection (COD) stands as a significant challenge in computer vision, dedicated to identifying and segmenting objects visually highly integrated with their backgrounds.<n>Current mainstream methods have made progress in cross-layer feature fusion, but two critical issues persist during the decoding stage.<n>The first is insufficient cross-channel information interaction within the same-layer features, limiting feature expressiveness.<n>The second is the inability to effectively co-model boundary and region information, making it difficult to accurately reconstruct complete regions and sharp boundaries of objects.
arXiv Detail & Related papers (2025-12-12T08:29:00Z)
DEPFusion: Dual-Domain Enhancement and Priority-Guided Mamba Fusion for UAV Multispectral Object Detection [6.4402018224356015]
A framework named DEPFusion is proposed for UAV multispectral object detection.<n>It consists of Dual-Domain Enhancement (DDE) and Priority-Guided Mamba Fusion (PGMF)<n>Experiments on DroneVehicle and VEDAI datasets demonstrate that DEPFusion achieves good performance with state-of-the-art methods.
arXiv Detail & Related papers (2025-09-09T01:51:57Z)
Multispectral State-Space Feature Fusion: Bridging Shared and Cross-Parametric Interactions for Object Detection [48.04749955821739]
A novel Multispectral State-Space Feature Fusion framework, dubbed MS2Fusion, is proposed.<n>MS2Fusion achieves efficient and effective fusion through a dual-path parametric interaction mechanism.<n>In our experiments on mainstream benchmarks, our MS2Fusion significantly outperforms other state-of-the-art multispectral object detection methods.
arXiv Detail & Related papers (2025-07-19T14:38:03Z)
A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling [54.05517338122698]
A popular similarity-based feature upsampling pipeline has been proposed, which utilizes a high-resolution feature as guidance.<n>We propose an explicitly controllable query-key feature alignment from both semantic-aware and detail-aware perspectives.<n>We develop a fine-grained neighbor selection strategy on HR features, which is simple yet effective for alleviating mosaic artifacts.
arXiv Detail & Related papers (2024-07-02T14:12:21Z)
Frequency Perception Network for Camouflaged Object Detection [51.26386921922031]
We propose a novel learnable and separable frequency perception mechanism driven by the semantic hierarchy in the frequency domain.<n>Our entire network adopts a two-stage model, including a frequency-guided coarse localization stage and a detail-preserving fine localization stage.<n>Compared with the currently existing models, our proposed method achieves competitive performance in three popular benchmark datasets.
arXiv Detail & Related papers (2023-08-17T11:30:46Z)
Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features. Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z)
Feature Aggregation and Propagation Network for Camouflaged Object Detection [42.33180748293329]
Camouflaged object detection (COD) aims to detect/segment camouflaged objects embedded in the environment. Several COD methods have been developed, but they still suffer from unsatisfactory performance due to intrinsic similarities between foreground objects and background surroundings. We propose a novel Feature Aggregation and propagation Network (FAP-Net) for camouflaged object detection.
arXiv Detail & Related papers (2022-12-02T05:54:28Z)
Semantic-aligned Fusion Transformer for One-shot Object Detection [18.58772037047498]
One-shot object detection aims at detecting novel objects according to merely one given instance. Current approaches explore various feature fusions to obtain directly transferable meta-knowledge. We propose a simple but effective architecture named Semantic-aligned Fusion Transformer (SaFT) to resolve these issues.
arXiv Detail & Related papers (2022-03-17T05:38:47Z)
A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation [68.10621089649486]
We propose Attention Aggregation based Feature Pyramid Network (A2-FPN) to improve multi-scale feature learning. A2-FPN achieves an improvement of 2.0% and 1.4% mask AP when integrated into the strong baselines such as Cascade Mask R-CNN and Hybrid Task Cascade.
arXiv Detail & Related papers (2021-05-07T11:51:08Z)
High-resolution Depth Maps Imaging via Attention-based Hierarchical Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR. We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z)
RGB-D Salient Object Detection with Cross-Modality Modulation and Selection [126.4462739820643]
We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD) The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features.
arXiv Detail & Related papers (2020-07-14T14:22:50Z)
Cross-layer Feature Pyramid Network for Salient Object Detection [102.20031050972429]
We propose a novel Cross-layer Feature Pyramid Network to improve the progressive fusion in salient object detection. The distributed features per layer own both semantics and salient details from all other layers simultaneously, and suffer reduced loss of important information.
arXiv Detail & Related papers (2020-02-25T14:06:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.