Contrastive Learning for Multi-Object Tracking with Transformers
- URL: http://arxiv.org/abs/2311.08043v1
- Date: Tue, 14 Nov 2023 10:07:52 GMT
- Title: Contrastive Learning for Multi-Object Tracking with Transformers
- Authors: Pierre-Fran\c{c}ois De Plaen, Nicola Marinello, Marc Proesmans, Tinne
Tuytelaars, Luc Van Gool
- Abstract summary: We show how DETR can be turned into a MOT model by employing an instance-level contrastive loss.
Our training scheme learns object appearances while preserving detection capabilities and with little overhead.
Its performance surpasses the previous state-of-the-art by +2.6 mMOTA on the challenging BDD100K dataset.
- Score: 79.61791059432558
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The DEtection TRansformer (DETR) opened new possibilities for object
detection by modeling it as a translation task: converting image features into
object-level representations. Previous works typically add expensive modules to
DETR to perform Multi-Object Tracking (MOT), resulting in more complicated
architectures. We instead show how DETR can be turned into a MOT model by
employing an instance-level contrastive loss, a revised sampling strategy and a
lightweight assignment method. Our training scheme learns object appearances
while preserving detection capabilities and with little overhead. Its
performance surpasses the previous state-of-the-art by +2.6 mMOTA on the
challenging BDD100K dataset and is comparable to existing transformer-based
methods on the MOT17 dataset.
Related papers
- MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - ZJU ReLER Submission for EPIC-KITCHEN Challenge 2023: Semi-Supervised
Video Object Segmentation [62.98078087018469]
We introduce MSDeAOT, a variant of the AOT framework that incorporates transformers at multiple feature scales.
MSDeAOT efficiently propagates object masks from previous frames to the current frame using a feature scale with a stride of 16.
We also employ GPM in a more refined feature scale with a stride of 8, leading to improved accuracy in detecting and tracking small objects.
arXiv Detail & Related papers (2023-07-05T03:43:15Z) - End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time.
Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z) - Exploring Modulated Detection Transformer as a Tool for Action
Recognition in Videos [0.0]
Modulated Detection Transformer (MDETR) is an end-to-end multi-modal understanding model.
We show that it is possible to use a multi-modal model to tackle a task that it was not designed for.
arXiv Detail & Related papers (2022-09-21T05:19:39Z) - Scaling Novel Object Detection with Weakly Supervised Detection
Transformers [21.219817483091166]
We propose the Weakly Supervised Detection Transformer, which enables efficient knowledge transfer from a large-scale pretraining dataset to WSOD finetuning.
Our experiments show that our approach outperforms previous state-of-the-art models on large-scale novel object detection datasets.
arXiv Detail & Related papers (2022-07-11T21:45:54Z) - An Empirical Study Of Self-supervised Learning Approaches For Object
Detection With Transformers [0.0]
We explore self-supervised methods based on image reconstruction, masked image modeling and jigsaw.
Preliminary experiments in the iSAID dataset demonstrate faster convergence of DETR in the initial epochs in both pretraining and multi-task learning settings.
arXiv Detail & Related papers (2022-05-11T14:39:27Z) - PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result.
Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z) - MOTR: End-to-End Multiple-Object Tracking with TRansformer [31.78906135775541]
We present MOTR, the first fully end-to-end multiple object tracking framework.
It learns to model the long-range temporal variation of the objects.
Results show that MOTR achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-05-07T13:27:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.