Related papers: PSNet: Parallel Symmetric Network for Video Salient Object Detection

PSNet: Parallel Symmetric Network for Video Salient Object Detection

URL: http://arxiv.org/abs/2210.05912v1
Date: Wed, 12 Oct 2022 04:11:48 GMT
Title: PSNet: Parallel Symmetric Network for Video Salient Object Detection
Authors: Runmin Cong, Weiyu Song, Jianjun Lei, Guanghui Yue, Yao Zhao, and Sam Kwong
Abstract summary: We propose a VSOD network with up and down parallel symmetry, named PSNet. Two parallel branches with different dominant modalities are set to achieve complete video saliency decoding.
Score: 85.94443548452729
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: For the video salient object detection (VSOD) task, how to excavate the information from the appearance modality and the motion modality has always been a topic of great concern. The two-stream structure, including an RGB appearance stream and an optical flow motion stream, has been widely used as a typical pipeline for VSOD tasks, but the existing methods usually only use motion features to unidirectionally guide appearance features or adaptively but blindly fuse two modality features. However, these methods underperform in diverse scenarios due to the uncomprehensive and unspecific learning schemes. In this paper, following a more secure modeling philosophy, we deeply investigate the importance of appearance modality and motion modality in a more comprehensive way and propose a VSOD network with up and down parallel symmetry, named PSNet. Two parallel branches with different dominant modalities are set to achieve complete video saliency decoding with the cooperation of the Gather Diffusion Reinforcement (GDR) module and Cross-modality Refinement and Complement (CRC) module. Finally, we use the Importance Perception Fusion (IPF) module to fuse the features from two parallel branches according to their different importance in different scenarios. Experiments on four dataset benchmarks demonstrate that our method achieves desirable and competitive performance.

Related papers

Intrinsic Saliency Guided Trunk-Collateral Network for Unsupervised Video Object Segmentation [5.742190785269344]
We present ISTC-Net, which better balances the motion-appearance relationship and incorporates model's intrinsic saliency information to enhance segmentation performance. ISTC-Net achieved state-of-the-art performance on three UVOS datasets.
arXiv Detail & Related papers (2025-04-08T11:02:14Z)
Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection [70.84835546732738]
RGB-Thermal Salient Object Detection aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images. Traditional encoder-decoder architectures may not have adequately considered the robustness against noise originating from defective modalities. We propose the ConTriNet, a robust Confluent Triple-Flow Network employing a Divide-and-Conquer strategy.
arXiv Detail & Related papers (2024-12-02T14:44:39Z)
Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter. We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another. Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z)
Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features. Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z)
Efficient Unsupervised Video Object Segmentation Network Based on Motion Guidance [1.5736899098702974]
This paper proposes a video object segmentation network based on motion guidance. The model comprises a dual-stream network, motion guidance module, and multi-scale progressive fusion module. The experimental results prove the superior performance of the proposed method.
arXiv Detail & Related papers (2022-11-10T06:13:23Z)
Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities. We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement. Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z)
M2RNet: Multi-modal and Multi-scale Refined Network for RGB-D Salient Object Detection [1.002712867721496]
Methods based on RGB-D often suffer from the incompatibility of multi-modal feature fusion and the insufficiency of multi-scale feature aggregation. We propose a novel multi-modal and multi-scale refined network (M2RNet) Three essential components are presented in this network.
arXiv Detail & Related papers (2021-09-16T12:15:40Z)
Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS) Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage. We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z)
Deep feature selection-and-fusion for RGB-D semantic segmentation [8.831857715361624]
This work proposes a unified and efficient feature selectionand-fusion network (FSFNet) FSFNet contains a symmetric cross-modality residual fusion module used for explicit fusion of multi-modality information. Compared with the state-of-the-art methods, experimental evaluations demonstrate that the proposed model achieves competitive performance on two public datasets.
arXiv Detail & Related papers (2021-05-10T04:02:32Z)
Dual Semantic Fusion Network for Video Object Detection [35.175552056938635]
We propose a dual semantic fusion network (DSFNet) to fully exploit both frame-level and instance-level semantics in a unified fusion framework without external guidance. The proposed DSFNet can generate more robust features through the multi-granularity fusion and avoid being affected by the instability of external guidance.
arXiv Detail & Related papers (2020-09-16T06:49:17Z)
RGB-D Salient Object Detection with Cross-Modality Modulation and Selection [126.4462739820643]
We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD) The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features.
arXiv Detail & Related papers (2020-07-14T14:22:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.