Self-supervised Video Object Segmentation
- URL: http://arxiv.org/abs/2006.12480v1
- Date: Mon, 22 Jun 2020 17:55:59 GMT
- Title: Self-supervised Video Object Segmentation
- Authors: Fangrui Zhu, Li Zhang, Yanwei Fu, Guodong Guo, Weidi Xie
- Abstract summary: The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking)
We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
- Score: 76.83567326586162
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The objective of this paper is self-supervised representation learning, with
the goal of solving semi-supervised video object segmentation (a.k.a. dense
tracking). We make the following contributions: (i) we propose to improve the
existing self-supervised approach, with a simple, yet more effective memory
mechanism for long-term correspondence matching, which resolves the challenge
caused by the dis-appearance and reappearance of objects; (ii) by augmenting
the self-supervised approach with an online adaptation module, our method
successfully alleviates tracker drifts caused by spatial-temporal
discontinuity, e.g. occlusions or dis-occlusions, fast motions; (iii) we
explore the efficiency of self-supervised representation learning for dense
tracking, surprisingly, we show that a powerful tracking model can be trained
with as few as 100 raw video clips (equivalent to a duration of 11mins),
indicating that low-level statistics have already been effective for tracking
tasks; (iv) we demonstrate state-of-the-art results among the self-supervised
approaches on DAVIS-2017 and YouTube-VOS, as well as surpassing most of methods
trained with millions of manual segmentation annotations, further bridging the
gap between self-supervised and supervised learning. Codes are released to
foster any further research (https://github.com/fangruizhu/self_sup_semiVOS).
Related papers
- Self-Supervised Multi-Object Tracking For Autonomous Driving From
Consistency Across Timescales [53.55369862746357]
Self-supervised multi-object trackers have tremendous potential as they enable learning from raw domain-specific data.
However, their re-identification accuracy still falls short compared to their supervised counterparts.
We propose a training objective that enables self-supervised learning of re-identification features from multiple sequential frames.
arXiv Detail & Related papers (2023-04-25T20:47:29Z) - Online Deep Clustering with Video Track Consistency [85.8868194550978]
We propose an unsupervised clustering-based approach to learn visual features from video object tracks.
We show that exploiting an unsupervised class-agnostic, yet noisy, track generator yields to better accuracy compared to relying on costly and precise track annotations.
arXiv Detail & Related papers (2022-06-07T08:11:00Z) - Self-Regulated Learning for Egocentric Video Activity Anticipation [147.9783215348252]
Self-Regulated Learning (SRL) aims to regulate the intermediate representation consecutively to produce representation that emphasizes the novel information in the frame of the current time-stamp.
SRL sharply outperforms existing state-of-the-art in most cases on two egocentric video datasets and two third-person video datasets.
arXiv Detail & Related papers (2021-11-23T03:29:18Z) - Dense Unsupervised Learning for Video Segmentation [49.46930315961636]
We present a novel approach to unsupervised learning for video object segmentation (VOS)
Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime.
Our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power.
arXiv Detail & Related papers (2021-11-11T15:15:11Z) - Semi-TCL: Semi-Supervised Track Contrastive Representation Learning [40.31083437957288]
We design a new instance-to-track matching objective to learn appearance embedding.
It compares a candidate detection to the embedding of the tracks persisted in the tracker.
We implement this learning objective in a unified form following the spirit of constrastive loss.
arXiv Detail & Related papers (2021-07-06T05:23:30Z) - Hierarchically Decoupled Spatial-Temporal Contrast for Self-supervised
Video Representation Learning [6.523119805288132]
We present a novel technique for self-supervised video representation learning by: (a) decoupling the learning objective into two contrastive subtasks respectively emphasizing spatial and temporal features, and (b) performing it hierarchically to encourage multi-scale understanding.
arXiv Detail & Related papers (2020-11-23T08:05:39Z) - Online Descriptor Enhancement via Self-Labelling Triplets for Visual
Data Association [28.03285334702022]
We propose a self-supervised method for incrementally refining visual descriptors to improve performance in the task of object-level visual data association.
Our method optimize deep descriptor generators online, by continuously training a widely available image classification network pre-trained with domain-independent data.
We show that our approach surpasses other visual data-association methods applied to a tracking-by-detection task, and show that it provides better performance-gains when compared to other methods that attempt to adapt to observed information.
arXiv Detail & Related papers (2020-11-06T17:42:04Z) - Self-supervised Object Tracking with Cycle-consistent Siamese Networks [55.040249900677225]
We exploit an end-to-end Siamese network in a cycle-consistent self-supervised framework for object tracking.
We propose to integrate a Siamese region proposal and mask regression network in our tracking framework so that a fast and more accurate tracker can be learned without the annotation of each frame.
arXiv Detail & Related papers (2020-08-03T04:10:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.