PSNet: Parallel Symmetric Network for Video Salient Object Detection
- URL: http://arxiv.org/abs/2210.05912v1
- Date: Wed, 12 Oct 2022 04:11:48 GMT
- Title: PSNet: Parallel Symmetric Network for Video Salient Object Detection
- Authors: Runmin Cong, Weiyu Song, Jianjun Lei, Guanghui Yue, Yao Zhao, and Sam
Kwong
- Abstract summary: We propose a VSOD network with up and down parallel symmetry, named PSNet.
Two parallel branches with different dominant modalities are set to achieve complete video saliency decoding.
- Score: 85.94443548452729
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: For the video salient object detection (VSOD) task, how to excavate the
information from the appearance modality and the motion modality has always
been a topic of great concern. The two-stream structure, including an RGB
appearance stream and an optical flow motion stream, has been widely used as a
typical pipeline for VSOD tasks, but the existing methods usually only use
motion features to unidirectionally guide appearance features or adaptively but
blindly fuse two modality features. However, these methods underperform in
diverse scenarios due to the uncomprehensive and unspecific learning schemes.
In this paper, following a more secure modeling philosophy, we deeply
investigate the importance of appearance modality and motion modality in a more
comprehensive way and propose a VSOD network with up and down parallel
symmetry, named PSNet. Two parallel branches with different dominant modalities
are set to achieve complete video saliency decoding with the cooperation of the
Gather Diffusion Reinforcement (GDR) module and Cross-modality Refinement and
Complement (CRC) module. Finally, we use the Importance Perception Fusion (IPF)
module to fuse the features from two parallel branches according to their
different importance in different scenarios. Experiments on four dataset
benchmarks demonstrate that our method achieves desirable and competitive
performance.
Related papers
- Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Efficient Unsupervised Video Object Segmentation Network Based on Motion
Guidance [1.5736899098702974]
This paper proposes a video object segmentation network based on motion guidance.
The model comprises a dual-stream network, motion guidance module, and multi-scale progressive fusion module.
The experimental results prove the superior performance of the proposed method.
arXiv Detail & Related papers (2022-11-10T06:13:23Z) - Transformer-based Network for RGB-D Saliency Detection [82.6665619584628]
Key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities.
We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement.
Our proposed network performs favorably against state-of-the-art RGB-D saliency detection methods.
arXiv Detail & Related papers (2021-12-01T15:53:58Z) - M2RNet: Multi-modal and Multi-scale Refined Network for RGB-D Salient
Object Detection [1.002712867721496]
Methods based on RGB-D often suffer from the incompatibility of multi-modal feature fusion and the insufficiency of multi-scale feature aggregation.
We propose a novel multi-modal and multi-scale refined network (M2RNet)
Three essential components are presented in this network.
arXiv Detail & Related papers (2021-09-16T12:15:40Z) - Full-Duplex Strategy for Video Object Segmentation [141.43983376262815]
Full- Strategy Network (FSNet) is a novel framework for video object segmentation (VOS)
Our FSNet performs the crossmodal feature-passing (i.e., transmission and receiving) simultaneously before fusion decoding stage.
We show that our FSNet outperforms other state-of-the-arts for both the VOS and video salient object detection tasks.
arXiv Detail & Related papers (2021-08-06T14:50:50Z) - Deep feature selection-and-fusion for RGB-D semantic segmentation [8.831857715361624]
This work proposes a unified and efficient feature selectionand-fusion network (FSFNet)
FSFNet contains a symmetric cross-modality residual fusion module used for explicit fusion of multi-modality information.
Compared with the state-of-the-art methods, experimental evaluations demonstrate that the proposed model achieves competitive performance on two public datasets.
arXiv Detail & Related papers (2021-05-10T04:02:32Z) - Dual Semantic Fusion Network for Video Object Detection [35.175552056938635]
We propose a dual semantic fusion network (DSFNet) to fully exploit both frame-level and instance-level semantics in a unified fusion framework without external guidance.
The proposed DSFNet can generate more robust features through the multi-granularity fusion and avoid being affected by the instability of external guidance.
arXiv Detail & Related papers (2020-09-16T06:49:17Z) - RGB-D Salient Object Detection with Cross-Modality Modulation and
Selection [126.4462739820643]
We present an effective method to progressively integrate and refine the cross-modality complementarities for RGB-D salient object detection (SOD)
The proposed network mainly solves two challenging issues: 1) how to effectively integrate the complementary information from RGB image and its corresponding depth map, and 2) how to adaptively select more saliency-related features.
arXiv Detail & Related papers (2020-07-14T14:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.