Learning to Learn Better for Video Object Segmentation
- URL: http://arxiv.org/abs/2212.02112v1
- Date: Mon, 5 Dec 2022 09:10:34 GMT
- Title: Learning to Learn Better for Video Object Segmentation
- Authors: Meng Lan, Jing Zhang, Lefei Zhang, Dacheng Tao
- Abstract summary: We propose a novel framework that emphasizes Learning to Learn Better (LLB) target features for SVOS.
We design the discriminative label generation module (DLGM) and the adaptive fusion module to address these issues.
Our proposed LLB method achieves state-of-the-art performance.
- Score: 94.5753973590207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, the joint learning framework (JOINT) integrates matching based
transductive reasoning and online inductive learning to achieve accurate and
robust semi-supervised video object segmentation (SVOS). However, using the
mask embedding as the label to guide the generation of target features in the
two branches may result in inadequate target representation and degrade the
performance. Besides, how to reasonably fuse the target features in the two
different branches rather than simply adding them together to avoid the adverse
effect of one dominant branch has not been investigated. In this paper, we
propose a novel framework that emphasizes Learning to Learn Better (LLB) target
features for SVOS, termed LLB, where we design the discriminative label
generation module (DLGM) and the adaptive fusion module to address these
issues. Technically, the DLGM takes the background-filtered frame instead of
the target mask as input and adopts a lightweight encoder to generate the
target features, which serves as the label of the online few-shot learner and
the value of the decoder in the transformer to guide the two branches to learn
more discriminative target representation. The adaptive fusion module maintains
a learnable gate for each branch, which reweighs the element-wise feature
representation and allows an adaptive amount of target information in each
branch flowing to the fused target feature, thus preventing one branch from
being dominant and making the target feature more robust to distractor.
Extensive experiments on public benchmarks show that our proposed LLB method
achieves state-of-the-art performance.
Related papers
- CLIP Is Also a Good Teacher: A New Learning Framework for Inductive
Zero-shot Semantic Segmentation [6.181169909576527]
Generalized Zero-shot Semantic aims to segment both seen and unseen categories only under the supervision of the seen ones.
Existing methods adopt the large-scale Vision Language Models (VLMs) which obtain outstanding zero-shot performance.
We propose CLIP-ZSS (Zero-shot Semantic), a training framework that enables any image encoder designed for closed-set segmentation applied in zero-shot and open-vocabulary tasks.
arXiv Detail & Related papers (2023-10-03T09:33:47Z) - Pulling Target to Source: A New Perspective on Domain Adaptive Semantic Segmentation [80.1412989006262]
Domain adaptive semantic segmentation aims to transfer knowledge from a labeled source domain to an unlabeled target domain.
We propose T2S-DA, which we interpret as a form of pulling Target to Source for Domain Adaptation.
arXiv Detail & Related papers (2023-05-23T07:09:09Z) - USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text
Retrieval [115.28586222748478]
Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to the given query from the other modality.
Existing approaches typically suffer from two major limitations.
arXiv Detail & Related papers (2023-01-17T12:42:58Z) - Saliency Guided Inter- and Intra-Class Relation Constraints for Weakly
Supervised Semantic Segmentation [66.87777732230884]
We propose a saliency guided Inter- and Intra-Class Relation Constrained (I$2$CRC) framework to assist the expansion of the activated object regions.
We also introduce an object guided label refinement module to take a full use of both the segmentation prediction and the initial labels for obtaining superior pseudo-labels.
arXiv Detail & Related papers (2022-06-20T03:40:56Z) - Shuffle Augmentation of Features from Unlabeled Data for Unsupervised
Domain Adaptation [21.497019000131917]
Unsupervised Domain Adaptation (UDA) is a branch of transfer learning where labels for target samples are unavailable.
In this paper, we propose Shuffle Augmentation of Features (SAF) as a novel UDA framework.
SAF learns from the target samples, adaptively distills class-aware target features, and implicitly guides the classifier to find comprehensive class borders.
arXiv Detail & Related papers (2022-01-28T07:11:05Z) - Joint Inductive and Transductive Learning for Video Object Segmentation [107.32760625159301]
Semi-supervised object segmentation is a task of segmenting the target object in a video sequence given only a mask in the first frame.
Most previous best-performing methods adopt matching-based transductive reasoning or online inductive learning.
We propose to integrate transductive and inductive learning into a unified framework to exploit complement between them for accurate and robust video object segmentation.
arXiv Detail & Related papers (2021-08-08T16:25:48Z) - Aligning Pretraining for Detection via Object-Level Contrastive Learning [57.845286545603415]
Image-level contrastive representation learning has proven to be highly effective as a generic model for transfer learning.
We argue that this could be sub-optimal and thus advocate a design principle which encourages alignment between the self-supervised pretext task and the downstream task.
Our method, called Selective Object COntrastive learning (SoCo), achieves state-of-the-art results for transfer performance on COCO detection.
arXiv Detail & Related papers (2021-06-04T17:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.