Holistic Prototype Attention Network for Few-Shot VOS
- URL: http://arxiv.org/abs/2307.07933v1
- Date: Sun, 16 Jul 2023 03:48:57 GMT
- Title: Holistic Prototype Attention Network for Few-Shot VOS
- Authors: Yin Tang, Tao Chen, Xiruo Jiang, Yazhou Yao, Guo-Sen Xie, and Heng-Tao
Shen
- Abstract summary: Few-shot video object segmentation (FSVOS) aims to segment dynamic objects of unseen classes by resorting to a small set of support images.
We propose a holistic prototype attention network (HPAN) for advancing FSVOS.
- Score: 74.25124421163542
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot video object segmentation (FSVOS) aims to segment dynamic objects of
unseen classes by resorting to a small set of support images that contain
pixel-level object annotations. Existing methods have demonstrated that the
domain agent-based attention mechanism is effective in FSVOS by learning the
correlation between support images and query frames. However, the agent frame
contains redundant pixel information and background noise, resulting in
inferior segmentation performance. Moreover, existing methods tend to ignore
inter-frame correlations in query videos. To alleviate the above dilemma, we
propose a holistic prototype attention network (HPAN) for advancing FSVOS.
Specifically, HPAN introduces a prototype graph attention module (PGAM) and a
bidirectional prototype attention module (BPAM), transferring informative
knowledge from seen to unseen classes. PGAM generates local prototypes from all
foreground features and then utilizes their internal correlations to enhance
the representation of the holistic prototypes. BPAM exploits the holistic
information from support images and video frames by fusing co-attention and
self-attention to achieve support-query semantic consistency and inner-frame
temporal consistency. Extensive experiments on YouTube-FSVOS have been provided
to demonstrate the effectiveness and superiority of our proposed HPAN method.
Related papers
- Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation [67.35274834837064]
We develop a universal vision-language framework (UniFSS) to integrate prompts from text, mask, box, and image.
UniFSS significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-16T08:41:01Z) - Multi-grained Temporal Prototype Learning for Few-shot Video Object
Segmentation [156.4142424784322]
Few-Shot Video Object (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images.
We propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data.
Our proposed video IPMT model significantly outperforms previous models on two benchmark datasets.
arXiv Detail & Related papers (2023-09-20T09:16:34Z) - Self-supervised Few-shot Learning for Semantic Segmentation: An
Annotation-free Approach [4.855689194518905]
Few-shot semantic segmentation (FSS) offers immense potential in the field of medical image analysis.
Existing FSS techniques heavily rely on annotated semantic classes, rendering them unsuitable for medical images.
We propose a novel self-supervised FSS framework that does not rely on any annotation. Instead, it adaptively estimates the query mask by leveraging the eigenvectors obtained from the support images.
arXiv Detail & Related papers (2023-07-26T18:33:30Z) - Breaking Immutable: Information-Coupled Prototype Elaboration for
Few-Shot Object Detection [15.079980293820137]
We propose an Information-Coupled Prototype Elaboration (ICPE) method to generate specific and representative prototypes for each query image.
Our method achieves state-of-the-art performance in almost all settings.
arXiv Detail & Related papers (2022-11-27T10:33:11Z) - Dual Prototype Attention for Unsupervised Video Object Segmentation [28.725754274542304]
Unsupervised video object segmentation (VOS) aims to detect and segment the most salient object in videos.
This paper proposes two novel prototype-based attention mechanisms, inter-modality attention (IMA) and inter-frame attention (IFA)
arXiv Detail & Related papers (2022-11-22T06:19:17Z) - Progressively Dual Prior Guided Few-shot Semantic Segmentation [57.37506990980975]
Few-shot semantic segmentation task aims at performing segmentation in query images with a few annotated support samples.
We propose a progressively dual prior guided few-shot semantic segmentation network.
arXiv Detail & Related papers (2022-11-20T16:19:47Z) - Beyond the Prototype: Divide-and-conquer Proxies for Few-shot
Segmentation [63.910211095033596]
Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples.
We propose a simple yet versatile framework in the spirit of divide-and-conquer.
Our proposed approach, named divide-and-conquer proxies (DCP), allows for the development of appropriate and reliable information.
arXiv Detail & Related papers (2022-04-21T06:21:14Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS)
Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.