CFMD: Dynamic Cross-layer Feature Fusion for Salient Object Detection
- URL: http://arxiv.org/abs/2504.01326v1
- Date: Wed, 02 Apr 2025 03:22:36 GMT
- Title: CFMD: Dynamic Cross-layer Feature Fusion for Salient Object Detection
- Authors: Jin Lian, Zhongyu Wan, Ming Gao, JunFeng Chen,
- Abstract summary: Cross-layer feature pyramid networks (CFPNs) have achieved notable progress in multi-scale feature fusion and boundary detail preservation for salient object detection.<n>To address these challenges, we propose CFMD, a novel cross-layer feature pyramid network that introduces two key innovations.<n>First, we design a context-aware feature aggregation module (CFLMA), which incorporates the state-of-the-art Mamba architecture to construct a dynamic weight distribution mechanism.<n>Second, we introduce an adaptive dynamic upsampling unit (CFLMD) that preserves spatial details during resolution recovery.
- Score: 7.262250906929891
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Cross-layer feature pyramid networks (CFPNs) have achieved notable progress in multi-scale feature fusion and boundary detail preservation for salient object detection. However, traditional CFPNs still suffer from two core limitations: (1) a computational bottleneck caused by complex feature weighting operations, and (2) degraded boundary accuracy due to feature blurring in the upsampling process. To address these challenges, we propose CFMD, a novel cross-layer feature pyramid network that introduces two key innovations. First, we design a context-aware feature aggregation module (CFLMA), which incorporates the state-of-the-art Mamba architecture to construct a dynamic weight distribution mechanism. This module adaptively adjusts feature importance based on image context, significantly improving both representation efficiency and generalization. Second, we introduce an adaptive dynamic upsampling unit (CFLMD) that preserves spatial details during resolution recovery. By adjusting the upsampling range dynamically and initializing with a bilinear strategy, the module effectively reduces feature overlap and maintains fine-grained boundary structures. Extensive experiments on three standard benchmarks using three mainstream backbone networks demonstrate that CFMD achieves substantial improvements in pixel-level accuracy and boundary segmentation quality, especially in complex scenes. The results validate the effectiveness of CFMD in jointly enhancing computational efficiency and segmentation performance, highlighting its strong potential in salient object detection tasks.
Related papers
- DSU-Net:An Improved U-Net Model Based on DINOv2 and SAM2 with Multi-scale Cross-model Feature Enhancement [7.9006143460465355]
This paper proposes a multi-scale feature collabora-tion framework guided by DINOv2 for SAM2, with core innovations in three aspects.
It surpasses existing state-of-the-art meth-ods in downstream tasks such as camouflage target detection and salient ob-ject detection, without requiring costly training processes.
arXiv Detail & Related papers (2025-03-27T06:08:24Z) - ContextFormer: Redefining Efficiency in Semantic Segmentation [48.81126061219231]
Convolutional methods, although capturing local dependencies well, struggle with long-range relationships.<n>Vision Transformers (ViTs) excel in global context capture but are hindered by high computational demands.<n>We propose ContextFormer, a hybrid framework leveraging the strengths of CNNs and ViTs in the bottleneck to balance efficiency, accuracy, and robustness for real-time semantic segmentation.
arXiv Detail & Related papers (2025-01-31T16:11:04Z) - Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection [70.84835546732738]
RGB-Thermal Salient Object Detection aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images.<n>Traditional encoder-decoder architectures may not have adequately considered the robustness against noise originating from defective modalities.<n>We propose the ConTriNet, a robust Confluent Triple-Flow Network employing a Divide-and-Conquer strategy.
arXiv Detail & Related papers (2024-12-02T14:44:39Z) - PVAFN: Point-Voxel Attention Fusion Network with Multi-Pooling Enhancing for 3D Object Detection [59.355022416218624]
integration of point and voxel representations is becoming more common in LiDAR-based 3D object detection.
We propose a novel two-stage 3D object detector, called Point-Voxel Attention Fusion Network (PVAFN)
PVAFN uses a multi-pooling strategy to integrate both multi-scale and region-specific information effectively.
arXiv Detail & Related papers (2024-08-26T19:43:01Z) - Double-Shot 3D Shape Measurement with a Dual-Branch Network for Structured Light Projection Profilometry [14.749887303860717]
We propose a dual-branch Convolutional Neural Network (CNN)-Transformer network (PDCNet) to process different structured light (SL) modalities.
Within PDCNet, a Transformer branch is used to capture global perception in the fringe images, while a CNN branch is designed to collect local details in the speckle images.
Our method can reduce fringe order ambiguity while producing high-accuracy results on self-made datasets.
arXiv Detail & Related papers (2024-07-19T10:49:26Z) - Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection [17.406051477690134]
Event cameras output sparse and asynchronous events, providing a potential solution to solve these problems.
We propose a novel hierarchical feature refinement network for event-frame fusion.
Our method exhibits significantly better robustness when introducing 15 different corruption types to the frame images.
arXiv Detail & Related papers (2024-07-17T14:09:46Z) - X Modality Assisting RGBT Object Tracking [1.730147049648545]
A novel X Modality Assisting Network (X-Net) is introduced, which explores the impact of the fusion paradigm by decoupling visual object tracking into three distinct levels.<n>X-Net achieves performance gains of 0.47%/1.2% in the average of precise rate and success rate.
arXiv Detail & Related papers (2023-12-27T05:38:54Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Transformer-based Context Condensation for Boosting Feature Pyramids in
Object Detection [77.50110439560152]
Current object detectors typically have a feature pyramid (FP) module for multi-level feature fusion (MFF)
We propose a novel and efficient context modeling mechanism that can help existing FPs deliver better MFF results.
In particular, we introduce a novel insight that comprehensive contexts can be decomposed and condensed into two types of representations for higher efficiency.
arXiv Detail & Related papers (2022-07-14T01:45:03Z) - Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS)
Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage.
We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z) - Cross-layer Feature Pyramid Network for Salient Object Detection [102.20031050972429]
We propose a novel Cross-layer Feature Pyramid Network to improve the progressive fusion in salient object detection.
The distributed features per layer own both semantics and salient details from all other layers simultaneously, and suffer reduced loss of important information.
arXiv Detail & Related papers (2020-02-25T14:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.