Curriculum Learning for Recurrent Video Object Segmentation
- URL: http://arxiv.org/abs/2008.06698v1
- Date: Sat, 15 Aug 2020 10:51:22 GMT
- Title: Curriculum Learning for Recurrent Video Object Segmentation
- Authors: Maria Gonzalez-i-Calabuig, Carles Ventura and Xavier Gir\'o-i-Nieto
- Abstract summary: This work explores different schedule sampling and frame skipping variations to significantly improve the performance of a recurrent architecture.
Our results on the car class of the KITTI-MOTS challenge indicate that, surprisingly, an inverse schedule sampling is a better option than a classic forward one.
- Score: 2.3376061255029064
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Video object segmentation can be understood as a sequence-to-sequence task
that can benefit from the curriculum learning strategies for better and faster
training of deep neural networks. This work explores different schedule
sampling and frame skipping variations to significantly improve the performance
of a recurrent architecture. Our results on the car class of the KITTI-MOTS
challenge indicate that, surprisingly, an inverse schedule sampling is a better
option than a classic forward one. Also, that a progressive skipping of frames
during training is beneficial, but only when training with the ground truth
masks instead of the predicted ones. Source code and trained models are
available at http://imatge-upc.github.io/rvos-mots/.
Related papers
- Learning from One Continuous Video Stream [70.30084026960819]
We introduce a framework for online learning from a single continuous video stream.
This poses great challenges given the high correlation between consecutive video frames.
We employ pixel-to-pixel modelling as a practical and flexible way to switch between pre-training and single-stream evaluation.
arXiv Detail & Related papers (2023-12-01T14:03:30Z) - TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at
Scale [59.01246141215051]
We analyze the factor that leads to degradation from the perspective of language supervision.
We propose a tunable-free pre-training strategy to retain the generalization ability of the text encoder.
We produce a series of models, dubbed TVTSv2, with up to one billion parameters.
arXiv Detail & Related papers (2023-05-23T15:44:56Z) - EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - Frozen CLIP Models are Efficient Video Learners [86.73871814176795]
Video recognition has been dominated by the end-to-end learning paradigm.
Recent advances in Contrastive Vision-Language Pre-training pave the way for a new route for visual recognition tasks.
We present Efficient Video Learning -- an efficient framework for directly training high-quality video recognition models.
arXiv Detail & Related papers (2022-08-06T17:38:25Z) - Training Your Sparse Neural Network Better with Any Mask [106.134361318518]
Pruning large neural networks to create high-quality, independently trainable sparse masks is desirable.
In this paper we demonstrate an alternative opportunity: one can customize the sparse training techniques to deviate from the default dense network training protocols.
Our new sparse training recipe is generally applicable to improving training from scratch with various sparse masks.
arXiv Detail & Related papers (2022-06-26T00:37:33Z) - Class-incremental Learning using a Sequence of Partial Implicitly
Regularized Classifiers [0.0]
In class-incremental learning, the objective is to learn a number of classes sequentially without having access to the whole training data.
Our experiments on CIFAR100 dataset show that the proposed method improves the performance over SOTA by a large margin.
arXiv Detail & Related papers (2021-04-04T10:02:45Z) - Learning Fast and Robust Target Models for Video Object Segmentation [83.3382606349118]
Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time.
Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting.
We propose a novel VOS architecture consisting of two network components.
arXiv Detail & Related papers (2020-02-27T21:58:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.