Semi-TCL: Semi-Supervised Track Contrastive Representation Learning
- URL: http://arxiv.org/abs/2107.02396v1
- Date: Tue, 6 Jul 2021 05:23:30 GMT
- Title: Semi-TCL: Semi-Supervised Track Contrastive Representation Learning
- Authors: Wei Li, Yuanjun Xiong, Shuo Yang, Mingze Xu, Yongxin Wang, Wei Xia
- Abstract summary: We design a new instance-to-track matching objective to learn appearance embedding.
It compares a candidate detection to the embedding of the tracks persisted in the tracker.
We implement this learning objective in a unified form following the spirit of constrastive loss.
- Score: 40.31083437957288
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Online tracking of multiple objects in videos requires strong capacity of
modeling and matching object appearances. Previous methods for learning
appearance embedding mostly rely on instance-level matching without considering
the temporal continuity provided by videos. We design a new instance-to-track
matching objective to learn appearance embedding that compares a candidate
detection to the embedding of the tracks persisted in the tracker. It enables
us to learn not only from videos labeled with complete tracks, but also
unlabeled or partially labeled videos. We implement this learning objective in
a unified form following the spirit of constrastive loss. Experiments on
multiple object tracking datasets demonstrate that our method can effectively
learning discriminative appearance embeddings in a semi-supervised fashion and
outperform state of the art methods on representative benchmarks.
Related papers
- QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple
Object Tracking [73.52284039530261]
We present Quasi-Dense Similarity Learning, which densely samples hundreds of object regions on a pair of images for contrastive learning.
We find that the resulting distinctive feature space admits a simple nearest neighbor search at inference time for object association.
We show that our similarity learning scheme is not limited to video data, but can learn effective instance similarity even from static input.
arXiv Detail & Related papers (2022-10-12T15:47:36Z) - Few-Shot Action Localization without Knowing Boundaries [9.959844922120523]
We show that it is possible to learn to localize actions in untrimmed videos when only one/few trimmed examples of the target action are available at test time.
We propose a network that learns to estimate Temporal Similarity Matrices (TSMs) that model a fine-grained similarity pattern between pairs of videos.
Our method achieves performance comparable or better to state-of-the-art fully-supervised, few-shot learning methods.
arXiv Detail & Related papers (2021-06-08T07:32:43Z) - CoCon: Cooperative-Contrastive Learning [52.342936645996765]
Self-supervised visual representation learning is key for efficient video analysis.
Recent success in learning image representations suggests contrastive learning is a promising framework to tackle this challenge.
We introduce a cooperative variant of contrastive learning to utilize complementary information across views.
arXiv Detail & Related papers (2021-04-30T05:46:02Z) - Multiview Pseudo-Labeling for Semi-supervised Learning from Video [102.36355560553402]
We present a novel framework that uses complementary views in the form of appearance and motion information for semi-supervised learning in video.
Our method capitalizes on multiple views, but it nonetheless trains a model that is shared across appearance and motion input.
On multiple video recognition datasets, our method substantially outperforms its supervised counterpart, and compares favorably to previous work on standard benchmarks in self-supervised video representation learning.
arXiv Detail & Related papers (2021-04-01T17:59:48Z) - Learning to Track Instances without Video Annotations [85.9865889886669]
We introduce a novel semi-supervised framework by learning instance tracking networks with only a labeled image dataset and unlabeled video sequences.
We show that even when only trained with images, the learned feature representation is robust to instance appearance variations.
In addition, we integrate this module into single-stage instance segmentation and pose estimation frameworks.
arXiv Detail & Related papers (2021-04-01T06:47:41Z) - Self-supervised Video Object Segmentation [76.83567326586162]
The objective of this paper is self-supervised representation learning, with the goal of solving semi-supervised video object segmentation (a.k.a. dense tracking)
We make the following contributions: (i) we propose to improve the existing self-supervised approach, with a simple, yet more effective memory mechanism for long-term correspondence matching; (ii) by augmenting the self-supervised approach with an online adaptation module, our method successfully alleviates tracker drifts caused by spatial-temporal discontinuity; (iv) we demonstrate state-of-the-art results among the self-supervised approaches on DAVIS-2017 and YouTube
arXiv Detail & Related papers (2020-06-22T17:55:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.