Related papers: Self-Supervised Multi-Object Tracking with Cross-Input Consistency

Self-Supervised Multi-Object Tracking with Cross-Input Consistency

URL: http://arxiv.org/abs/2111.05943v1
Date: Wed, 10 Nov 2021 21:00:34 GMT
Title: Self-Supervised Multi-Object Tracking with Cross-Input Consistency
Authors: Favyen Bastani, Songtao He, Sam Madden
Abstract summary: We propose a self-supervised learning procedure for training a robust multi-object tracking (MOT) model given only unlabeled video. We then compute tracks in that sequence by applying an RNN model independently on each input, and train the model to produce consistent tracks across the two inputs.
Score: 5.8762433393846045
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we propose a self-supervised learning procedure for training a robust multi-object tracking (MOT) model given only unlabeled video. While several self-supervisory learning signals have been proposed in prior work on single-object tracking, such as color propagation and cycle-consistency, these signals cannot be directly applied for training RNN models, which are needed to achieve accurate MOT: they yield degenerate models that, for instance, always match new detections to tracks with the closest initial detections. We propose a novel self-supervisory signal that we call cross-input consistency: we construct two distinct inputs for the same sequence of video, by hiding different information about the sequence in each input. We then compute tracks in that sequence by applying an RNN model independently on each input, and train the model to produce consistent tracks across the two inputs. We evaluate our unsupervised method on MOT17 and KITTI -- remarkably, we find that, despite training only on unlabeled video, our unsupervised approach outperforms four supervised methods published in the last 1--2 years, including Tracktor++, FAMNet, GSM, and mmMOT.

Related papers

Towards Universal Modal Tracking with Online Dense Temporal Token Learning [66.83607018706519]
We propose a universal video-level modality-awareness tracking model with online dense temporal token learning.<n>We expand the model's inputs to a video sequence level, aiming to see a richer video context from a near-global perspective.
arXiv Detail & Related papers (2025-07-27T08:47:42Z)
CAMELTrack: Context-Aware Multi-cue ExpLoitation for Online Multi-Object Tracking [68.24998698508344]
We introduce CAMEL, a novel association module for Context-Aware Multi-Cue ExpLoitation.<n>Unlike end-to-end detection-by-tracking approaches, our method remains lightweight and fast to train while being able to leverage external off-the-shelf models.<n>Our proposed online tracking pipeline, CAMELTrack, achieves state-of-the-art performance on multiple tracking benchmarks.
arXiv Detail & Related papers (2025-05-02T13:26:23Z)
Real-Time Moving Flock Detection in Pedestrian Trajectories Using Sequential Deep Learning Models [1.2289361708127877]
This paper investigates the use of sequential deep learning models, including Recurrent Neural Networks (RNNs), for real-time flock detection in multi-pedestrian trajectories. We validate our method using real-world group movement datasets, demonstrating its robustness across varying sequence lengths and diverse movement patterns. We extend our approach to identify other forms of collective motion, such as convoys and swarms, paving the way for more comprehensive multi-agent behavior analysis.
arXiv Detail & Related papers (2025-02-21T07:04:34Z)
SynCL: A Synergistic Training Strategy with Instance-Aware Contrastive Learning for End-to-End Multi-Camera 3D Tracking [34.90147791481045]
SynCL is a novel plug-and-play synergistic training strategy designed to co-facilitate multi-task learning for detection and tracking. We show that SynCL consistently delivers improvements when integrated with the training stage of various query-based 3D MOT trackers. Without additional inference costs, SynCL improves the state-of-the-art PF-Track method by $+3.9%$ AMOTA and $+2.0%$ NDS on the nuScenes dataset.
arXiv Detail & Related papers (2024-11-11T08:18:49Z)
Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Appearance Graphs [117.67620297750685]
We introduce Walker, the first self-supervised tracker that learns from videos with sparse bounding box annotations, and no tracking labels. Walker is the first self-supervised tracker to achieve competitive performance on MOT17, DanceTrack, and BDD100K.
arXiv Detail & Related papers (2024-09-25T18:00:00Z)
ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model [20.259334882471574]
Multi-object tracking (MOT) is a critical technology in computer vision, designed to detect multiple targets in video sequences and assign each target a unique ID per frame. Existed MOT methods excel at accurately tracking multiple objects in real-time across various scenarios. We propose a novel ConsistencyTrack, joint detection and tracking(JDT) framework that formulates detection and association as a denoising diffusion process on bounding boxes.
arXiv Detail & Related papers (2024-08-28T05:53:30Z)
Refining Pre-Trained Motion Models [56.18044168821188]
We take on the challenge of improving state-of-the-art supervised models with self-supervised training. We focus on obtaining a "clean" training signal from real-world unlabelled video. We show that our method yields reliable gains over fully-supervised methods in real videos.
arXiv Detail & Related papers (2024-01-01T18:59:33Z)
IDM-Follower: A Model-Informed Deep Learning Method for Long-Sequence Car-Following Trajectory Prediction [24.94160059351764]
Most car-following models are generative and only consider the inputs of the speed, position, and acceleration of the last time step. We implement a novel structure with two independent encoders and a self-attention decoder that could sequentially predict the following trajectories. Numerical experiments with multiple settings on simulation and NGSIM datasets show that the IDM-Follower can improve the prediction performance.
arXiv Detail & Related papers (2022-10-20T02:24:27Z)
MM-TTA: Multi-Modal Test-Time Adaptation for 3D Semantic Segmentation [104.48766162008815]
We propose and explore a new multi-modal extension of test-time adaptation for 3D semantic segmentation. To design a framework that can take full advantage of multi-modality, each modality provides regularized self-supervisory signals to other modalities. Our regularized pseudo labels produce stable self-learning signals in numerous multi-modal test-time adaptation scenarios.
arXiv Detail & Related papers (2022-04-27T02:28:12Z)
Unified Transformer Tracker for Object Tracking [58.65901124158068]
We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm. A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
arXiv Detail & Related papers (2022-03-29T01:38:49Z)
Exploring Simple 3D Multi-Object Tracking for Autonomous Driving [10.921208239968827]
3D multi-object tracking in LiDAR point clouds is a key ingredient for self-driving vehicles. Existing methods are predominantly based on the tracking-by-detection pipeline and inevitably require a matching step for the detection association. We present SimTrack to simplify the hand-crafted tracking paradigm by proposing an end-to-end trainable model for joint detection and tracking from raw point clouds.
arXiv Detail & Related papers (2021-08-23T17:59:22Z)
Self-Supervised Person Detection in 2D Range Data using a Calibrated Camera [83.31666463259849]
We propose a method to automatically generate training labels (called pseudo-labels) for 2D LiDAR-based person detectors. We show that self-supervised detectors, trained or fine-tuned with pseudo-labels, outperform detectors trained using manual annotations. Our method is an effective way to improve person detectors during deployment without any additional labeling effort.
arXiv Detail & Related papers (2020-12-16T12:10:04Z)
A Novel Anomaly Detection Algorithm for Hybrid Production Systems based on Deep Learning and Timed Automata [73.38551379469533]
DAD:DeepAnomalyDetection is a new approach for automatic model learning and anomaly detection in hybrid production systems. It combines deep learning and timed automata for creating behavioral model from observations. The algorithm has been applied to few data sets including two from real systems and has shown promising results.
arXiv Detail & Related papers (2020-10-29T08:27:43Z)
Multi-object tracking with self-supervised associating network [5.947279761429668]
We propose a novel self-supervised learning method using a lot of short videos which has no human labeling. Despite the re-identification network is trained in a self-supervised manner, it achieves the state-of-the-art performance of MOTA 62.0% and IDF1 62.6% on the MOT17 test benchmark.
arXiv Detail & Related papers (2020-10-26T08:48:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.