Multi-grained Temporal Prototype Learning for Few-shot Video Object
Segmentation
- URL: http://arxiv.org/abs/2309.11160v1
- Date: Wed, 20 Sep 2023 09:16:34 GMT
- Title: Multi-grained Temporal Prototype Learning for Few-shot Video Object
Segmentation
- Authors: Nian Liu, Kepan Nan, Wangbo Zhao, Yuanwei Liu, Xiwen Yao, Salman Khan,
Hisham Cholakkal, Rao Muhammad Anwer, Junwei Han, Fahad Shahbaz Khan
- Abstract summary: Few-Shot Video Object (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images.
We propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data.
Our proposed video IPMT model significantly outperforms previous models on two benchmark datasets.
- Score: 156.4142424784322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-Shot Video Object Segmentation (FSVOS) aims to segment objects in a query
video with the same category defined by a few annotated support images.
However, this task was seldom explored. In this work, based on IPMT, a
state-of-the-art few-shot image segmentation method that combines external
support guidance information with adaptive query guidance cues, we propose to
leverage multi-grained temporal guidance information for handling the temporal
correlation nature of video data. We decompose the query video information into
a clip prototype and a memory prototype for capturing local and long-term
internal temporal guidance, respectively. Frame prototypes are further used for
each frame independently to handle fine-grained adaptive guidance and enable
bidirectional clip-frame prototype communication. To reduce the influence of
noisy memory, we propose to leverage the structural similarity relation among
different predicted regions and the support for selecting reliable memory
frames. Furthermore, a new segmentation loss is also proposed to enhance the
category discriminability of the learned prototypes. Experimental results
demonstrate that our proposed video IPMT model significantly outperforms
previous models on two benchmark datasets. Code is available at
https://github.com/nankepan/VIPMT.
Related papers
- Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Holistic Prototype Attention Network for Few-Shot VOS [74.25124421163542]
Few-shot video object segmentation (FSVOS) aims to segment dynamic objects of unseen classes by resorting to a small set of support images.
We propose a holistic prototype attention network (HPAN) for advancing FSVOS.
arXiv Detail & Related papers (2023-07-16T03:48:57Z) - RefineVIS: Video Instance Segmentation with Temporal Attention
Refinement [23.720986152136785]
RefineVIS learns two separate representations on top of an off-the-shelf frame-level image instance segmentation model.
A Temporal Attention Refinement (TAR) module learns discriminative segmentation representations by exploiting temporal relationships.
It achieves state-of-the-art video instance segmentation accuracy on YouTube-VIS 2019 (64.4 AP), Youtube-VIS 2021 (61.4 AP), and OVIS (46.1 AP) datasets.
arXiv Detail & Related papers (2023-06-07T20:45:15Z) - Improving Video Instance Segmentation via Temporal Pyramid Routing [61.10753640148878]
Video Instance (VIS) is a new and inherently multi-task problem, which aims to detect, segment and track each instance in a video sequence.
We propose a Temporal Pyramid Routing (TPR) strategy to conditionally align and conduct pixel-level aggregation from a feature pyramid pair of two adjacent frames.
Our approach is a plug-and-play module and can be easily applied to existing instance segmentation methods.
arXiv Detail & Related papers (2021-07-28T03:57:12Z) - Prototypical Cross-Attention Networks for Multiple Object Tracking and
Segmentation [95.74244714914052]
Multiple object tracking and segmentation requires detecting, tracking, and segmenting objects belonging to a set of given classes.
We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich-temporal information online.
PCAN outperforms current video instance tracking and segmentation competition winners on Youtube-VIS and BDD100K datasets.
arXiv Detail & Related papers (2021-06-22T17:57:24Z) - 1st Place Solution for YouTubeVOS Challenge 2021:Video Instance
Segmentation [0.39146761527401414]
Video Instance (VIS) is a multi-task problem performing detection, segmentation, and tracking simultaneously.
We propose two modules, named Temporally Correlated Instance (TCIS) and Bidirectional Tracking (BiTrack)
By combining these techniques with a bag of tricks, the network performance is significantly boosted compared to the baseline.
arXiv Detail & Related papers (2021-06-12T00:20:38Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.