TransCenter: Transformers with Dense Queries for Multiple-Object
Tracking
- URL: http://arxiv.org/abs/2103.15145v1
- Date: Sun, 28 Mar 2021 14:49:36 GMT
- Title: TransCenter: Transformers with Dense Queries for Multiple-Object
Tracking
- Authors: Yihong Xu, Yutong Ban, Guillaume Delorme, Chuang Gan, Daniela Rus,
Xavier Alameda-Pineda
- Abstract summary: We argue that the standard representation -- bounding boxes -- is not adapted to learning transformers for multiple-object tracking.
We propose TransCenter, the first transformer-based architecture for tracking the centers of multiple targets.
- Score: 87.75122600164167
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Transformer networks have proven extremely powerful for a wide variety of
tasks since they were introduced. Computer vision is not an exception, as the
use of transformers has become very popular in the vision community in recent
years. Despite this wave, multiple-object tracking (MOT) exhibits for now some
sort of incompatibility with transformers. We argue that the standard
representation -- bounding boxes -- is not adapted to learning transformers for
MOT. Inspired by recent research, we propose TransCenter, the first
transformer-based architecture for tracking the centers of multiple targets.
Methodologically, we propose the use of dense queries in a double-decoder
network, to be able to robustly infer the heatmap of targets' centers and
associate them through time. TransCenter outperforms the current
state-of-the-art in multiple-object tracking, both in MOT17 and MOT20. Our
ablation study demonstrates the advantage in the proposed architecture compared
to more naive alternatives. The code will be made publicly available.
Related papers
- The Progression of Transformers from Language to Vision to MOT: A Literature Review on Multi-Object Tracking with Transformers [0.0]
transformer neural network architecture allows for autoregressive sequence-to-sequence modeling.
Transformers have also been applied across a wide variety of pattern recognition tasks, particularly in computer vision.
arXiv Detail & Related papers (2024-06-24T16:45:28Z) - Strong-TransCenter: Improved Multi-Object Tracking based on Transformers
with Dense Representations [1.2891210250935146]
TransCenter is a transformer-based MOT architecture with dense object queries for accurately tracking all the objects.
This paper shows an improvement to this tracker using post processing mechanism based in the Track-by-Detection paradigm.
Our new tracker shows significant improvements in the IDF1 and HOTA metrics and comparable results on the MOTA metric.
arXiv Detail & Related papers (2022-10-24T19:47:58Z) - Boosting vision transformers for image retrieval [11.441395750267052]
Vision transformers have achieved remarkable progress in vision tasks such as image classification and detection.
However, in instance-level image retrieval, transformers have not yet shown good performance compared to convolutional networks.
We propose a number of improvements that make transformers outperform the state of the art for the first time.
arXiv Detail & Related papers (2022-10-21T12:17:12Z) - Transformers in Remote Sensing: A Survey [76.95730131233424]
We are the first to present a systematic review of advances based on transformers in remote sensing.
Our survey covers more than 60 recent transformers-based methods for different remote sensing problems.
We conclude the survey by discussing different challenges and open issues of transformers in remote sensing.
arXiv Detail & Related papers (2022-09-02T17:57:05Z) - 3D Vision with Transformers: A Survey [114.86385193388439]
The success of the transformer architecture in natural language processing has triggered attention in the computer vision field.
We present a systematic and thorough review of more than 100 transformers methods for different 3D vision tasks.
We discuss transformer design in 3D vision, which allows it to process data with various 3D representations.
arXiv Detail & Related papers (2022-08-08T17:59:11Z) - TransVG++: End-to-End Visual Grounding with Language Conditioned Vision
Transformer [188.00681648113223]
We explore neat yet effective Transformer-based frameworks for visual grounding.
TransVG establishes multi-modal correspondences by Transformers and localizes referred regions by directly regressing box coordinates.
We upgrade our framework to a purely Transformer-based one by leveraging Vision Transformer (ViT) for vision feature encoding.
arXiv Detail & Related papers (2022-06-14T06:27:38Z) - Efficient Visual Tracking with Exemplar Transformers [98.62550635320514]
We introduce the Exemplar Transformer, an efficient transformer for real-time visual object tracking.
E.T.Track, our visual tracker that incorporates Exemplar Transformer layers, runs at 47 fps on a CPU.
This is up to 8 times faster than other transformer-based models.
arXiv Detail & Related papers (2021-12-17T18:57:54Z) - ViDT: An Efficient and Effective Fully Transformer-based Object Detector [97.71746903042968]
Detection transformers are the first fully end-to-end learning systems for object detection.
vision transformers are the first fully transformer-based architecture for image classification.
In this paper, we integrate Vision and Detection Transformers (ViDT) to build an effective and efficient object detector.
arXiv Detail & Related papers (2021-10-08T06:32:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.