Self-Supervised Any-Point Tracking by Contrastive Random Walks
- URL: http://arxiv.org/abs/2409.16288v1
- Date: Tue, 24 Sep 2024 17:59:56 GMT
- Title: Self-Supervised Any-Point Tracking by Contrastive Random Walks
- Authors: Ayush Shrivastava, Andrew Owens,
- Abstract summary: We train a global matching transformer to find cycle consistent tracks through video via contrastive random walks.
Our method achieves strong performance on the TapVid benchmarks, outperforming previous self-supervised tracking methods.
- Score: 17.50529887238381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a simple, self-supervised approach to the Tracking Any Point (TAP) problem. We train a global matching transformer to find cycle consistent tracks through video via contrastive random walks, using the transformer's attention-based global matching to define the transition matrices for a random walk on a space-time graph. The ability to perform "all pairs" comparisons between points allows the model to obtain high spatial precision and to obtain a strong contrastive learning signal, while avoiding many of the complexities of recent approaches (such as coarse-to-fine matching). To do this, we propose a number of design decisions that allow global matching architectures to be trained through self-supervision using cycle consistency. For example, we identify that transformer-based methods are sensitive to shortcut solutions, and propose a data augmentation scheme to address them. Our method achieves strong performance on the TapVid benchmarks, outperforming previous self-supervised tracking methods, such as DIFT, and is competitive with several supervised methods.
Related papers
- ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model [20.259334882471574]
Multi-object tracking (MOT) is a critical technology in computer vision, designed to detect multiple targets in video sequences and assign each target a unique ID per frame.
Existed MOT methods excel at accurately tracking multiple objects in real-time across various scenarios.
We propose a novel ConsistencyTrack, joint detection and tracking(JDT) framework that formulates detection and association as a denoising diffusion process on bounding boxes.
arXiv Detail & Related papers (2024-08-28T05:53:30Z) - Self-Supervised Multi-Object Tracking with Path Consistency [28.923565712817645]
We propose a novel concept of path consistency to learn robust object matching without using manual object identity supervision.
We generate multiple observation paths, each specifying a different set of frames to be skipped, and formulate the Path Consistency Loss that enforces the association results are consistent across different observation paths.
arXiv Detail & Related papers (2024-04-08T01:29:10Z) - Semi-DETR: Semi-Supervised Object Detection with Detection Transformers [105.45018934087076]
We analyze the DETR-based framework on semi-supervised object detection (SSOD)
We present Semi-DETR, the first transformer-based end-to-end semi-supervised object detector.
Our method outperforms all state-of-the-art methods by clear margins.
arXiv Detail & Related papers (2023-07-16T16:32:14Z) - SeqCo-DETR: Sequence Consistency Training for Self-Supervised Object
Detection with Transformers [18.803007408124156]
We propose SeqCo-DETR, a Sequence Consistency-based self-supervised method for object DEtection with TRansformers.
Our method achieves state-of-the-art results on MS COCO (45.8 AP) and PASCAL VOC (64.1 AP), demonstrating the effectiveness of our approach.
arXiv Detail & Related papers (2023-03-15T09:36:58Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model [38.722096508198106]
We present a SEMantic Masked recurrent world model (SEM2), which introduces a semantic filter to extract key driving-relevant features and make decisions via the filtered features.
Our method outperforms the state-of-the-art approaches in terms of sample efficiency and robustness to input permutations.
arXiv Detail & Related papers (2022-10-08T13:00:08Z) - Transformer-based assignment decision network for multiple object
tracking [0.0]
We introduce Transformer-based Assignment Decision Network (TADN) that tackles data association without the need of explicit optimization during inference.
Our proposed approach outperforms the state-of-the-art in most evaluation metrics despite its simple nature as a tracker.
arXiv Detail & Related papers (2022-08-06T19:47:32Z) - Hybrid Tracker with Pixel and Instance for Video Panoptic Segmentation [50.62685357414904]
Video Panoptic coefficient (VPS) aims to generate coherent panoptic segmentation and track the identities of all pixels across video frames.
We present HybridTracker, a lightweight and joint tracking model attempting to eliminate the limitations of the single tracker.
Comprehensive experiments show that HybridTracker achieves superior performance than state-of-the-art methods on Cityscapes-VPS and VIPER datasets.
arXiv Detail & Related papers (2022-03-02T16:21:55Z) - GAN-Supervised Dense Visual Alignment [95.37027391102684]
We propose GAN-Supervised Learning, a framework for learning discriminative models and their GAN-generated training data jointly end-to-end.
Inspired by the classic Congealing method, our GANgealing algorithm trains a Spatial Transformer to map random samples from a GAN trained on unaligned data to a common, jointly-learned target mode.
arXiv Detail & Related papers (2021-12-09T18:59:58Z) - Contrastive Self-supervised Sequential Recommendation with Robust
Augmentation [101.25762166231904]
Sequential Recommendationdescribes a set of techniques to model dynamic user behavior in order to predict future interactions in sequential user data.
Old and new issues remain, including data-sparsity and noisy data.
We propose Contrastive Self-Supervised Learning for sequential Recommendation (CoSeRec)
arXiv Detail & Related papers (2021-08-14T07:15:25Z) - Self-Point-Flow: Self-Supervised Scene Flow Estimation from Point Clouds
with Optimal Transport and Random Walk [59.87525177207915]
We develop a self-supervised method to establish correspondences between two point clouds to approximate scene flow.
Our method achieves state-of-the-art performance among self-supervised learning methods.
arXiv Detail & Related papers (2021-05-18T03:12:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.