Learning What to Learn for Video Object Segmentation
- URL: http://arxiv.org/abs/2003.11540v2
- Date: Fri, 1 May 2020 16:10:19 GMT
- Title: Learning What to Learn for Video Object Segmentation
- Authors: Goutam Bhat, Felix J\"aremo Lawin, Martin Danelljan, Andreas Robinson,
Michael Felsberg, Luc Van Gool, Radu Timofte
- Abstract summary: We introduce an end-to-end trainable VOS architecture that integrates a differentiable few-shot learning module.
This internal learner is designed to predict a powerful parametric model of the target.
We set a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5.
- Score: 157.4154825304324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video object segmentation (VOS) is a highly challenging problem, since the
target object is only defined during inference with a given first-frame
reference mask. The problem of how to capture and utilize this limited target
information remains a fundamental research question. We address this by
introducing an end-to-end trainable VOS architecture that integrates a
differentiable few-shot learning module. This internal learner is designed to
predict a powerful parametric model of the target by minimizing a segmentation
error in the first frame. We further go beyond standard few-shot learning
techniques by learning what the few-shot learner should learn. This allows us
to achieve a rich internal representation of the target in the current frame,
significantly increasing the segmentation accuracy of our approach. We perform
extensive experiments on multiple benchmarks. Our approach sets a new
state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an
overall score of 81.5, corresponding to a 2.6% relative improvement over the
previous best result.
Related papers
- Self-supervised Video Object Segmentation with Distillation Learning of Deformable Attention [29.62044843067169]
Video object segmentation is a fundamental research problem in computer vision.
We propose a new method for self-supervised video object segmentation based on distillation learning of deformable attention.
arXiv Detail & Related papers (2024-01-25T04:39:48Z) - In-N-Out Generative Learning for Dense Unsupervised Video Segmentation [89.21483504654282]
In this paper, we focus on the unsupervised Video Object (VOS) task which learns visual correspondence from unlabeled videos.
We propose the In-aNd-Out (INO) generative learning from a purely generative perspective, which captures both high-level and fine-grained semantics.
Our INO outperforms previous state-of-the-art methods by significant margins.
arXiv Detail & Related papers (2022-03-29T07:56:21Z) - Learning Position and Target Consistency for Memory-based Video Object
Segmentation [39.787966275016906]
Learn position and target consistency framework for memory-based video object segmentation.
It applies the memory mechanism to retrieve pixels globally, and meanwhile learns position consistency for more reliable segmentation.
Experiments show that our LCM achieves state-of-the-art performance on both DAVIS and Youtube-VOS benchmark.
arXiv Detail & Related papers (2021-04-09T12:22:37Z) - Few-Shot Learning for Video Object Detection in a Transfer-Learning
Scheme [70.45901040613015]
We study the new problem of few-shot learning for video object detection.
We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.
arXiv Detail & Related papers (2021-03-26T20:37:55Z) - Fast Few-Shot Classification by Few-Iteration Meta-Learning [173.32497326674775]
We introduce a fast optimization-based meta-learning method for few-shot classification.
Our strategy enables important aspects of the base learner objective to be learned during meta-training.
We perform a comprehensive experimental analysis, demonstrating the speed and effectiveness of our approach.
arXiv Detail & Related papers (2020-10-01T15:59:31Z) - Learning Video Object Segmentation from Unlabeled Videos [158.18207922363783]
We propose a new method for video object segmentation (VOS) that addresses object pattern learning from unlabeled videos.
We introduce a unified unsupervised/weakly supervised learning framework, called MuG, that comprehensively captures properties of VOS at multiple granularities.
arXiv Detail & Related papers (2020-03-10T22:12:15Z) - Learning Fast and Robust Target Models for Video Object Segmentation [83.3382606349118]
Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time.
Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting.
We propose a novel VOS architecture consisting of two network components.
arXiv Detail & Related papers (2020-02-27T21:58:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.