Learning Fast and Robust Target Models for Video Object Segmentation
- URL: http://arxiv.org/abs/2003.00908v2
- Date: Tue, 31 Mar 2020 09:58:00 GMT
- Title: Learning Fast and Robust Target Models for Video Object Segmentation
- Authors: Andreas Robinson, Felix J\"aremo Lawin, Martin Danelljan, Fahad
Shahbaz Khan, Michael Felsberg
- Abstract summary: Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time.
Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting.
We propose a novel VOS architecture consisting of two network components.
- Score: 83.3382606349118
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video object segmentation (VOS) is a highly challenging problem since the
initial mask, defining the target object, is only given at test-time. The main
difficulty is to effectively handle appearance changes and similar background
objects, while maintaining accurate segmentation. Most previous approaches
fine-tune segmentation networks on the first frame, resulting in impractical
frame-rates and risk of overfitting. More recent methods integrate generative
target appearance models, but either achieve limited robustness or require
large amounts of training data.
We propose a novel VOS architecture consisting of two network components. The
target appearance model consists of a light-weight module, which is learned
during the inference stage using fast optimization techniques to predict a
coarse but robust target segmentation. The segmentation model is exclusively
trained offline, designed to process the coarse scores into high quality
segmentation masks. Our method is fast, easily trainable and remains highly
effective in cases of limited training data. We perform extensive experiments
on the challenging YouTube-VOS and DAVIS datasets. Our network achieves
favorable performance, while operating at higher frame-rates compared to
state-of-the-art. Code and trained models are available at
https://github.com/andr345/frtm-vos.
Related papers
- Convolutional Networks as Extremely Small Foundation Models: Visual Prompting and Theoretical Perspective [1.79487674052027]
In this paper, we design a prompting module which performs few-shot adaptation of generic deep networks to new tasks.
Driven by learning theory, we derive prompting modules that are as simple as possible, as they generalize better under the same training error.
In practice, SDForest has extremely low cost and achieves real-time even on CPU.
arXiv Detail & Related papers (2024-09-03T12:34:23Z) - SiamMask: A Framework for Fast Online Object Tracking and Segmentation [96.61632757952292]
SiamMask is a framework to perform both visual object tracking and video object segmentation, in real-time, with the same simple method.
We show that it is possible to extend the framework to handle multiple object tracking and segmentation by simply re-using the multi-task model.
It yields real-time state-of-the-art results on visual-object tracking benchmarks, while at the same time demonstrating competitive performance at a high speed for video object segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T14:47:17Z) - Dense Unsupervised Learning for Video Segmentation [49.46930315961636]
We present a novel approach to unsupervised learning for video object segmentation (VOS)
Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime.
Our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power.
arXiv Detail & Related papers (2021-11-11T15:15:11Z) - Learning to Associate Every Segment for Video Panoptic Segmentation [123.03617367709303]
We learn coarse segment-level matching and fine pixel-level matching together.
We show that our per-frame computation model can achieve new state-of-the-art results on Cityscapes-VPS and VIPER datasets.
arXiv Detail & Related papers (2021-06-17T13:06:24Z) - Reviving Iterative Training with Mask Guidance for Interactive
Segmentation [8.271859911016719]
Recent works on click-based interactive segmentation have demonstrated state-of-the-art results by using various inference-time optimization schemes.
We propose a simple feedforward model for click-based interactive segmentation that employs the segmentation masks from previous steps.
We find that the models trained on a combination of COCO and LVIS with diverse and high-quality annotations show performance superior to all existing models.
arXiv Detail & Related papers (2021-02-12T15:44:31Z) - Make One-Shot Video Object Segmentation Efficient Again [7.7415390727490445]
Video object segmentation (VOS) describes the task of segmenting a set of objects in each frame of a video.
e-OSVOS decouples the object detection task and predicts only local segmentation masks by applying a modified version of Mask R-CNN.
e-OSVOS provides state-of-the-art results on DAVIS 2016, DAVIS 2017, and YouTube-VOS for one-shot fine-tuning methods.
arXiv Detail & Related papers (2020-12-03T12:21:23Z) - The Devil is in Classification: A Simple Framework for Long-tail Object
Detection and Instance Segmentation [93.17367076148348]
We investigate performance drop of the state-of-the-art two-stage instance segmentation model Mask R-CNN on the recent long-tail LVIS dataset.
We unveil that a major cause is the inaccurate classification of object proposals.
We propose a simple calibration framework to more effectively alleviate classification head bias with a bi-level class balanced sampling approach.
arXiv Detail & Related papers (2020-07-23T12:49:07Z) - Learning What to Learn for Video Object Segmentation [157.4154825304324]
We introduce an end-to-end trainable VOS architecture that integrates a differentiable few-shot learning module.
This internal learner is designed to predict a powerful parametric model of the target.
We set a new state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an overall score of 81.5.
arXiv Detail & Related papers (2020-03-25T17:58:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.