Holistic Prototype Attention Network for Few-Shot VOS
- URL: http://arxiv.org/abs/2307.07933v1
- Date: Sun, 16 Jul 2023 03:48:57 GMT
- Title: Holistic Prototype Attention Network for Few-Shot VOS
- Authors: Yin Tang, Tao Chen, Xiruo Jiang, Yazhou Yao, Guo-Sen Xie, and Heng-Tao
Shen
- Abstract summary: Few-shot video object segmentation (FSVOS) aims to segment dynamic objects of unseen classes by resorting to a small set of support images.
We propose a holistic prototype attention network (HPAN) for advancing FSVOS.
- Score: 74.25124421163542
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Few-shot video object segmentation (FSVOS) aims to segment dynamic objects of
unseen classes by resorting to a small set of support images that contain
pixel-level object annotations. Existing methods have demonstrated that the
domain agent-based attention mechanism is effective in FSVOS by learning the
correlation between support images and query frames. However, the agent frame
contains redundant pixel information and background noise, resulting in
inferior segmentation performance. Moreover, existing methods tend to ignore
inter-frame correlations in query videos. To alleviate the above dilemma, we
propose a holistic prototype attention network (HPAN) for advancing FSVOS.
Specifically, HPAN introduces a prototype graph attention module (PGAM) and a
bidirectional prototype attention module (BPAM), transferring informative
knowledge from seen to unseen classes. PGAM generates local prototypes from all
foreground features and then utilizes their internal correlations to enhance
the representation of the holistic prototypes. BPAM exploits the holistic
information from support images and video frames by fusing co-attention and
self-attention to achieve support-query semantic consistency and inner-frame
temporal consistency. Extensive experiments on YouTube-FSVOS have been provided
to demonstrate the effectiveness and superiority of our proposed HPAN method.
Related papers
- FCC: Fully Connected Correlation for Few-Shot Segmentation [11.277022867553658]
Few-shot segmentation (FSS) aims to segment the target object in a query image using only a small set of support images and masks.
Previous methods have tried to obtain prior information by creating correlation maps from pixel-level correlation on final-layer or same-layer features.
We introduce FCC (Fully Connected Correlation) to integrate pixel-level correlations between support and query features.
arXiv Detail & Related papers (2024-11-18T03:32:02Z) - Beyond Mask: Rethinking Guidance Types in Few-shot Segmentation [67.35274834837064]
We develop a universal vision-language framework (UniFSS) to integrate prompts from text, mask, box, and image.
UniFSS significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-07-16T08:41:01Z) - Multi-grained Temporal Prototype Learning for Few-shot Video Object
Segmentation [156.4142424784322]
Few-Shot Video Object (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images.
We propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data.
Our proposed video IPMT model significantly outperforms previous models on two benchmark datasets.
arXiv Detail & Related papers (2023-09-20T09:16:34Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - Breaking Immutable: Information-Coupled Prototype Elaboration for
Few-Shot Object Detection [15.079980293820137]
We propose an Information-Coupled Prototype Elaboration (ICPE) method to generate specific and representative prototypes for each query image.
Our method achieves state-of-the-art performance in almost all settings.
arXiv Detail & Related papers (2022-11-27T10:33:11Z) - Dual Prototype Attention for Unsupervised Video Object Segmentation [28.725754274542304]
Unsupervised video object segmentation (VOS) aims to detect and segment the most salient object in videos.
This paper proposes two novel prototype-based attention mechanisms, inter-modality attention (IMA) and inter-frame attention (IFA)
arXiv Detail & Related papers (2022-11-22T06:19:17Z) - Progressively Dual Prior Guided Few-shot Semantic Segmentation [57.37506990980975]
Few-shot semantic segmentation task aims at performing segmentation in query images with a few annotated support samples.
We propose a progressively dual prior guided few-shot semantic segmentation network.
arXiv Detail & Related papers (2022-11-20T16:19:47Z) - Learning to Aggregate Multi-Scale Context for Instance Segmentation in
Remote Sensing Images [28.560068780733342]
A novel context aggregation network (CATNet) is proposed to improve the feature extraction process.
The proposed model exploits three lightweight plug-and-play modules, namely dense feature pyramid network (DenseFPN), spatial context pyramid ( SCP), and hierarchical region of interest extractor (HRoIE)
arXiv Detail & Related papers (2021-11-22T08:55:25Z) - Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS)
Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.