Self-Supervised Multi-Object Tracking with Cross-Input Consistency
- URL: http://arxiv.org/abs/2111.05943v1
- Date: Wed, 10 Nov 2021 21:00:34 GMT
- Title: Self-Supervised Multi-Object Tracking with Cross-Input Consistency
- Authors: Favyen Bastani, Songtao He, Sam Madden
- Abstract summary: We propose a self-supervised learning procedure for training a robust multi-object tracking (MOT) model given only unlabeled video.
We then compute tracks in that sequence by applying an RNN model independently on each input, and train the model to produce consistent tracks across the two inputs.
- Score: 5.8762433393846045
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a self-supervised learning procedure for training a
robust multi-object tracking (MOT) model given only unlabeled video. While
several self-supervisory learning signals have been proposed in prior work on
single-object tracking, such as color propagation and cycle-consistency, these
signals cannot be directly applied for training RNN models, which are needed to
achieve accurate MOT: they yield degenerate models that, for instance, always
match new detections to tracks with the closest initial detections. We propose
a novel self-supervisory signal that we call cross-input consistency: we
construct two distinct inputs for the same sequence of video, by hiding
different information about the sequence in each input. We then compute tracks
in that sequence by applying an RNN model independently on each input, and
train the model to produce consistent tracks across the two inputs. We evaluate
our unsupervised method on MOT17 and KITTI -- remarkably, we find that, despite
training only on unlabeled video, our unsupervised approach outperforms four
supervised methods published in the last 1--2 years, including Tracktor++,
FAMNet, GSM, and mmMOT.
Related papers
- Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Appearance Graphs [117.67620297750685]
We introduce Walker, the first self-supervised tracker that learns from videos with sparse bounding box annotations, and no tracking labels.
Walker is the first self-supervised tracker to achieve competitive performance on MOT17, DanceTrack, and BDD100K.
arXiv Detail & Related papers (2024-09-25T18:00:00Z) - ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model [20.259334882471574]
Multi-object tracking (MOT) is a critical technology in computer vision, designed to detect multiple targets in video sequences and assign each target a unique ID per frame.
Existed MOT methods excel at accurately tracking multiple objects in real-time across various scenarios.
We propose a novel ConsistencyTrack, joint detection and tracking(JDT) framework that formulates detection and association as a denoising diffusion process on bounding boxes.
arXiv Detail & Related papers (2024-08-28T05:53:30Z) - Refining Pre-Trained Motion Models [56.18044168821188]
We take on the challenge of improving state-of-the-art supervised models with self-supervised training.
We focus on obtaining a "clean" training signal from real-world unlabelled video.
We show that our method yields reliable gains over fully-supervised methods in real videos.
arXiv Detail & Related papers (2024-01-01T18:59:33Z) - IDM-Follower: A Model-Informed Deep Learning Method for Long-Sequence
Car-Following Trajectory Prediction [24.94160059351764]
Most car-following models are generative and only consider the inputs of the speed, position, and acceleration of the last time step.
We implement a novel structure with two independent encoders and a self-attention decoder that could sequentially predict the following trajectories.
Numerical experiments with multiple settings on simulation and NGSIM datasets show that the IDM-Follower can improve the prediction performance.
arXiv Detail & Related papers (2022-10-20T02:24:27Z) - MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation [104.48766162008815]
We propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation.
To design a framework that can take full advantage of multi-modality, each modality provides regularized self-supervisory signals to other modalities.
Our regularized pseudo labels produce stable self-learning signals in numerous multi-modal test-time adaptation scenarios.
arXiv Detail & Related papers (2022-04-27T02:28:12Z) - Unified Transformer Tracker for Object Tracking [58.65901124158068]
We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm.
A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
arXiv Detail & Related papers (2022-03-29T01:38:49Z) - Exploring Simple 3D Multi-Object Tracking for Autonomous Driving [10.921208239968827]
3D multi-object tracking in LiDAR point clouds is a key ingredient for self-driving vehicles.
Existing methods are predominantly based on the tracking-by-detection pipeline and inevitably require a matching step for the detection association.
We present SimTrack to simplify the hand-crafted tracking paradigm by proposing an end-to-end trainable model for joint detection and tracking from raw point clouds.
arXiv Detail & Related papers (2021-08-23T17:59:22Z) - Self-Supervised Person Detection in 2D Range Data using a Calibrated
Camera [83.31666463259849]
We propose a method to automatically generate training labels (called pseudo-labels) for 2D LiDAR-based person detectors.
We show that self-supervised detectors, trained or fine-tuned with pseudo-labels, outperform detectors trained using manual annotations.
Our method is an effective way to improve person detectors during deployment without any additional labeling effort.
arXiv Detail & Related papers (2020-12-16T12:10:04Z) - A Novel Anomaly Detection Algorithm for Hybrid Production Systems based
on Deep Learning and Timed Automata [73.38551379469533]
DAD:DeepAnomalyDetection is a new approach for automatic model learning and anomaly detection in hybrid production systems.
It combines deep learning and timed automata for creating behavioral model from observations.
The algorithm has been applied to few data sets including two from real systems and has shown promising results.
arXiv Detail & Related papers (2020-10-29T08:27:43Z) - Multi-object tracking with self-supervised associating network [5.947279761429668]
We propose a novel self-supervised learning method using a lot of short videos which has no human labeling.
Despite the re-identification network is trained in a self-supervised manner, it achieves the state-of-the-art performance of MOTA 62.0% and IDF1 62.6% on the MOT17 test benchmark.
arXiv Detail & Related papers (2020-10-26T08:48:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.