Refining Pre-Trained Motion Models
- URL: http://arxiv.org/abs/2401.00850v2
- Date: Sat, 17 Feb 2024 03:09:32 GMT
- Title: Refining Pre-Trained Motion Models
- Authors: Xinglong Sun, Adam W. Harley, and Leonidas J. Guibas
- Abstract summary: We take on the challenge of improving state-of-the-art supervised models with self-supervised training.
We focus on obtaining a "clean" training signal from real-world unlabelled video.
We show that our method yields reliable gains over fully-supervised methods in real videos.
- Score: 56.18044168821188
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given the difficulty of manually annotating motion in video, the current best
motion estimation methods are trained with synthetic data, and therefore
struggle somewhat due to a train/test gap. Self-supervised methods hold the
promise of training directly on real video, but typically perform worse. These
include methods trained with warp error (i.e., color constancy) combined with
smoothness terms, and methods that encourage cycle-consistency in the estimates
(i.e., tracking backwards should yield the opposite trajectory as tracking
forwards). In this work, we take on the challenge of improving state-of-the-art
supervised models with self-supervised training. We find that when the
initialization is supervised weights, most existing self-supervision techniques
actually make performance worse instead of better, which suggests that the
benefit of seeing the new data is overshadowed by the noise in the training
signal. Focusing on obtaining a "clean" training signal from real-world
unlabelled video, we propose to separate label-making and training into two
distinct stages. In the first stage, we use the pre-trained model to estimate
motion in a video, and then select the subset of motion estimates which we can
verify with cycle-consistency. This produces a sparse but accurate
pseudo-labelling of the video. In the second stage, we fine-tune the model to
reproduce these outputs, while also applying augmentations on the input. We
complement this boot-strapping method with simple techniques that densify and
re-balance the pseudo-labels, ensuring that we do not merely train on "easy"
tracks. We show that our method yields reliable gains over fully-supervised
methods in real videos, for both short-term (flow-based) and long-range
(multi-frame) pixel tracking.
Related papers
- CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos [63.90674869153876]
We introduce CoTracker3, comprising a new tracking model and a new semi-supervised training recipe.
This allows real videos without annotations to be used during training by generating pseudo-labels using off-the-shelf teachers.
The model is available in online and offline variants and reliably tracks visible and occluded points.
arXiv Detail & Related papers (2024-10-15T17:56:32Z) - Dense Unsupervised Learning for Video Segmentation [49.46930315961636]
We present a novel approach to unsupervised learning for video object segmentation (VOS)
Unlike previous work, our formulation allows to learn dense feature representations directly in a fully convolutional regime.
Our approach exceeds the segmentation accuracy of previous work despite using significantly less training data and compute power.
arXiv Detail & Related papers (2021-11-11T15:15:11Z) - Self-Supervised Multi-Object Tracking with Cross-Input Consistency [5.8762433393846045]
We propose a self-supervised learning procedure for training a robust multi-object tracking (MOT) model given only unlabeled video.
We then compute tracks in that sequence by applying an RNN model independently on each input, and train the model to produce consistent tracks across the two inputs.
arXiv Detail & Related papers (2021-11-10T21:00:34Z) - LogME: Practical Assessment of Pre-trained Models for Transfer Learning [80.24059713295165]
The Logarithm of Maximum Evidence (LogME) can be used to assess pre-trained models for transfer learning.
Compared to brute-force fine-tuning, LogME brings over $3000times$ speedup in wall-clock time.
arXiv Detail & Related papers (2021-02-22T13:58:11Z) - FROST: Faster and more Robust One-shot Semi-supervised Training [0.0]
We present a one-shot semi-supervised learning method that trains up to an order of magnitude faster and is more robust than state-of-the-art methods.
Our experiments demonstrate FROST's capability to perform well when the composition of the unlabeled data is unknown.
arXiv Detail & Related papers (2020-11-18T18:56:03Z) - Unsupervised Deep Representation Learning for Real-Time Tracking [137.69689503237893]
We propose an unsupervised learning method for visual tracking.
The motivation of our unsupervised learning is that a robust tracker should be effective in bidirectional tracking.
We build our framework on a Siamese correlation filter network, and propose a multi-frame validation scheme and a cost-sensitive loss to facilitate unsupervised learning.
arXiv Detail & Related papers (2020-07-22T08:23:12Z) - AutoTrajectory: Label-free Trajectory Extraction and Prediction from
Videos using Dynamic Points [92.91569287889203]
We present a novel, label-free algorithm, AutoTrajectory, for trajectory extraction and prediction.
To better capture the moving objects in videos, we introduce dynamic points.
We aggregate dynamic points to instance points, which stand for moving objects such as pedestrians in videos.
arXiv Detail & Related papers (2020-07-11T08:43:34Z) - Improving Semantic Segmentation via Self-Training [75.07114899941095]
We show that we can obtain state-of-the-art results using a semi-supervised approach, specifically a self-training paradigm.
We first train a teacher model on labeled data, and then generate pseudo labels on a large set of unlabeled data.
Our robust training framework can digest human-annotated and pseudo labels jointly and achieve top performances on Cityscapes, CamVid and KITTI datasets.
arXiv Detail & Related papers (2020-04-30T17:09:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.