Related papers: TransCenter: Transformers with Dense Queries for Multiple-Object Tracking

TransCenter: Transformers with Dense Queries for Multiple-Object Tracking

URL: http://arxiv.org/abs/2103.15145v1
Date: Sun, 28 Mar 2021 14:49:36 GMT
Title: TransCenter: Transformers with Dense Queries for Multiple-Object Tracking
Authors: Yihong Xu, Yutong Ban, Guillaume Delorme, Chuang Gan, Daniela Rus, Xavier Alameda-Pineda
Abstract summary: We argue that the standard representation -- bounding boxes -- is not adapted to learning transformers for multiple-object tracking. We propose TransCenter, the first transformer-based architecture for tracking the centers of multiple targets.
Score: 87.75122600164167
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Transformer networks have proven extremely powerful for a wide variety of tasks since they were introduced. Computer vision is not an exception, as the use of transformers has become very popular in the vision community in recent years. Despite this wave, multiple-object tracking (MOT) exhibits for now some sort of incompatibility with transformers. We argue that the standard representation -- bounding boxes -- is not adapted to learning transformers for MOT. Inspired by recent research, we propose TransCenter, the first transformer-based architecture for tracking the centers of multiple targets. Methodologically, we propose the use of dense queries in a double-decoder network, to be able to robustly infer the heatmap of targets' centers and associate them through time. TransCenter outperforms the current state-of-the-art in multiple-object tracking, both in MOT17 and MOT20. Our ablation study demonstrates the advantage in the proposed architecture compared to more naive alternatives. The code will be made publicly available.

Related papers

The Progression of Transformers from Language to Vision to MOT: A Literature Review on Multi-Object Tracking with Transformers [0.0]
transformer neural network architecture allows for autoregressive sequence-to-sequence modeling. Transformers have also been applied across a wide variety of pattern recognition tasks, particularly in computer vision.
arXiv Detail & Related papers (2024-06-24T16:45:28Z)
Strong-TransCenter: Improved Multi-Object Tracking based on Transformers with Dense Representations [1.2891210250935146]
TransCenter is a transformer-based MOT architecture with dense object queries for accurately tracking all the objects. This paper shows an improvement to this tracker using post processing mechanism based in the Track-by-Detection paradigm. Our new tracker shows significant improvements in the IDF1 and HOTA metrics and comparable results on the MOTA metric.
arXiv Detail & Related papers (2022-10-24T19:47:58Z)
Boosting vision transformers for image retrieval [11.441395750267052]
Vision transformers have achieved remarkable progress in vision tasks such as image classification and detection. However, in instance-level image retrieval, transformers have not yet shown good performance compared to convolutional networks. We propose a number of improvements that make transformers outperform the state of the art for the first time.
arXiv Detail & Related papers (2022-10-21T12:17:12Z)
Transformers in Remote Sensing: A Survey [76.95730131233424]
We are the first to present a systematic review of advances based on transformers in remote sensing. Our survey covers more than 60 recent transformers-based methods for different remote sensing problems. We conclude the survey by discussing different challenges and open issues of transformers in remote sensing.
arXiv Detail & Related papers (2022-09-02T17:57:05Z)
3D Vision with Transformers: A Survey [114.86385193388439]
The success of the transformer architecture in natural language processing has triggered attention in the computer vision field. We present a systematic and thorough review of more than 100 transformers methods for different 3D vision tasks. We discuss transformer design in 3D vision, which allows it to process data with various 3D representations.
arXiv Detail & Related papers (2022-08-08T17:59:11Z)
TransVG++: End-to-End Visual Grounding with Language Conditioned Vision Transformer [188.00681648113223]
We explore neat yet effective Transformer-based frameworks for visual grounding. TransVG establishes multi-modal correspondences by Transformers and localizes referred regions by directly regressing box coordinates. We upgrade our framework to a purely Transformer-based one by leveraging Vision Transformer (ViT) for vision feature encoding.
arXiv Detail & Related papers (2022-06-14T06:27:38Z)
Efficient Visual Tracking with Exemplar Transformers [98.62550635320514]
We introduce the Exemplar Transformer, an efficient transformer for real-time visual object tracking. E.T.Track, our visual tracker that incorporates Exemplar Transformer layers, runs at 47 fps on a CPU. This is up to 8 times faster than other transformer-based models.
arXiv Detail & Related papers (2021-12-17T18:57:54Z)
ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection. vision transformers are the first fully transformer-based architecture for image classification. In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.