PSNet: Parallel Symmetric Network for Video Salient Object Detection
- URL: http://arxiv.org/abs/2210.05912v1
- Date: Wed, 12 Oct 2022 04:11:48 GMT
- Title: PSNet: Parallel Symmetric Network for Video Salient Object Detection
- Authors: Runmin Cong, Weiyu Song, Jianjun Lei, Guanghui Yue, Yao Zhao, and Sam
Kwong
- Abstract summary: We propose a VSOD network with up and down parallel symmetry, named PSNet.
Two parallel branches with different dominant modalities are set to achieve complete video saliency decoding.
- Score: 85.94443548452729
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: For the video salient object detection (VSOD) task, how to excavate the
information from the appearance modality and the motion modality has always
been a topic of great concern. The two-stream structure, including an RGB
appearance stream and an optical flow motion stream, has been widely used as a
typical pipeline for VSOD tasks, but the existing methods usually only use
motion features to unidirectionally guide appearance features or adaptively but
blindly fuse two modality features. However, these methods underperform in
diverse scenarios due to the uncomprehensive and unspecific learning schemes.
In this paper, following a more secure modeling philosophy, we deeply
investigate the importance of appearance modality and motion modality in a more
comprehensive way and propose a VSOD network with up and down parallel
symmetry, named PSNet. Two parallel branches with different dominant modalities
are set to achieve complete video saliency decoding with the cooperation of the
Gather Diffusion Reinforcement (GDR) module and Cross-modality Refinement and
Complement (CRC) module. Finally, we use the Importance Perception Fusion (IPF)
module to fuse the features from two parallel branches according to their
different importance in different scenarios. Experiments on four dataset
benchmarks demonstrate that our method achieves desirable and competitive
performance.
Related papers
- Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection [70.84835546732738]
RGB-Thermal Salient Object Detection aims to pinpoint prominent objects within aligned pairs of visible and thermal infrared images.
Traditional encoder-decoder architectures may not have adequately considered the robustness against noise originating from defective modalities.
We propose the ConTriNet, a robust Confluent Triple-Flow Network employing a Divide-and-Conquer strategy.
arXiv Detail & Related papers (2024-12-02T14:44:39Z) - Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Efficient Unsupervised Video Object Segmentation Network Based on Motion
Guidance [1.5736899098702974]
This paper proposes a video object segmentation network based on motion guidance.
The model comprises a dual-stream network, motion guidance module, and multi-scale progressive fusion module.
The experimental results prove the superior performance of the proposed method.
arXiv Detail & Related papers (2022-11-10T06:13:23Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - M2RNet: Multi-modal and Multi-scale Refined Network for RGB-D Salient
Object Detection [1.002712867721496]
Methods based on RGB-D often suffer from the incompatibility of multi-modal feature fusion and the insufficiency of multi-scale feature aggregation.
We propose a novel multi-modal and multi-scale refined network (M2RNet)
Three essential components are presented in this network.
arXiv Detail & Related papers (2021-09-16T12:15:40Z) - Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS)
Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage.
We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z) - Dual Semantic Fusion Network for Video Object Detection [35.175552056938635]
We propose a dual semantic fusion network (DSFNet) to fully exploit both frame-level and instance-level semantics in a unified fusion framework without external guidance.
The proposed DSFNet can generate more robust features through the multi-granularity fusion and avoid being affected by the instability of external guidance.
arXiv Detail & Related papers (2020-09-16T06:49:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.