Multi-Source Fusion and Automatic Predictor Selection for Zero-Shot
Video Object Segmentation
- URL: http://arxiv.org/abs/2108.05076v1
- Date: Wed, 11 Aug 2021 07:37:44 GMT
- Title: Multi-Source Fusion and Automatic Predictor Selection for Zero-Shot
Video Object Segmentation
- Authors: Xiaoqi Zhao, Youwei Pang, Jiaxing Yang, Lihe Zhang, Huchuan Lu
- Abstract summary: We propose a novel multi-source fusion network for zero-shot video object segmentation.
The proposed model achieves compelling performance against the state-of-the-arts.
- Score: 86.94578023985677
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Location and appearance are the key cues for video object segmentation. Many
sources such as RGB, depth, optical flow and static saliency can provide useful
information about the objects. However, existing approaches only utilize the
RGB or RGB and optical flow. In this paper, we propose a novel multi-source
fusion network for zero-shot video object segmentation. With the help of
interoceptive spatial attention module (ISAM), spatial importance of each
source is highlighted. Furthermore, we design a feature purification module
(FPM) to filter the inter-source incompatible features. By the ISAM and FPM,
the multi-source features are effectively fused. In addition, we put forward an
automatic predictor selection network (APS) to select the better prediction of
either the static saliency predictor or the moving object predictor in order to
prevent over-reliance on the failed results caused by low-quality optical flow
maps. Extensive experiments on three challenging public benchmarks (i.e.
DAVIS$_{16}$, Youtube-Objects and FBMS) show that the proposed model achieves
compelling performance against the state-of-the-arts. The source code will be
publicly available at
\textcolor{red}{\url{https://github.com/Xiaoqi-Zhao-DLUT/Multi-Source-APS-ZVOS}}.
Related papers
- HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection [16.92362922379821]
We propose a deep learning method to improve infrared small object detection performance.
The method includes the parallelized patch-aware attention (PPA) module, dimension-aware selective integration (DASI) module, and multi-dilated channel refiner (MDCR) module.
arXiv Detail & Related papers (2024-03-16T02:45:42Z) - SimulFlow: Simultaneously Extracting Feature and Identifying Target for
Unsupervised Video Object Segmentation [28.19471998380114]
Unsupervised video object segmentation (UVOS) aims at detecting the primary objects in a given video sequence without any human interposing.
Most existing methods rely on two-stream architectures that separately encode the appearance and motion information before fusing them to identify the target and generate object masks.
We propose a novel UVOS model called SimulFlow that simultaneously performs feature extraction and target identification.
arXiv Detail & Related papers (2023-11-30T06:44:44Z) - Adaptive Multi-source Predictor for Zero-shot Video Object Segmentation [68.56443382421878]
We propose a novel adaptive multi-source predictor for zero-shot video object segmentation (ZVOS)
In the static object predictor, the RGB source is converted to depth and static saliency sources, simultaneously.
Experiments show that the proposed model outperforms the state-of-the-art methods on three challenging ZVOS benchmarks.
arXiv Detail & Related papers (2023-03-18T10:19:29Z) - Unsupervised Video Object Segmentation via Prototype Memory Network [5.612292166628669]
Unsupervised video object segmentation aims to segment a target object in the video without a ground truth mask in the initial frame.
This challenge requires extracting features for the most salient common objects within a video sequence.
We propose a novel prototype memory network architecture to solve this problem.
arXiv Detail & Related papers (2022-09-08T11:08:58Z) - Multi-Attention Network for Compressed Video Referring Object
Segmentation [103.18477550023513]
Referring video object segmentation aims to segment the object referred by a given language expression.
Existing works typically require compressed video bitstream to be decoded to RGB frames before being segmented.
This may hamper its application in real-world computing resource limited scenarios, such as autonomous cars and drones.
arXiv Detail & Related papers (2022-07-26T03:00:52Z) - MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking [72.65494220685525]
We propose a new dynamic modality-aware filter generation module (named MFGNet) to boost the message communication between visible and thermal data.
We generate dynamic modality-aware filters with two independent networks. The visible and thermal filters will be used to conduct a dynamic convolutional operation on their corresponding input feature maps respectively.
To address issues caused by heavy occlusion, fast motion, and out-of-view, we propose to conduct a joint local and global search by exploiting a new direction-aware target-driven attention mechanism.
arXiv Detail & Related papers (2021-07-22T03:10:51Z) - EPMF: Efficient Perception-aware Multi-sensor Fusion for 3D Semantic Segmentation [62.210091681352914]
We study multi-sensor fusion for 3D semantic segmentation for many applications, such as autonomous driving and robotics.
In this work, we investigate a collaborative fusion scheme called perception-aware multi-sensor fusion (PMF)
We propose a two-stream network to extract features from the two modalities separately. The extracted features are fused by effective residual-based fusion modules.
arXiv Detail & Related papers (2021-06-21T10:47:26Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.