Related papers: Strong-TransCenter: Improved Multi-Object Tracking based on Transformers with Dense Representations

Strong-TransCenter: Improved Multi-Object Tracking based on Transformers with Dense Representations

URL: http://arxiv.org/abs/2210.13570v1
Date: Mon, 24 Oct 2022 19:47:58 GMT
Title: Strong-TransCenter: Improved Multi-Object Tracking based on Transformers with Dense Representations
Authors: Amit Galor, Roy Orfaig, Ben-Zion Bobrovsky
Abstract summary: TransCenter is a transformer-based MOT architecture with dense object queries for accurately tracking all the objects. This paper shows an improvement to this tracker using post processing mechanism based in the Track-by-Detection paradigm. Our new tracker shows significant improvements in the IDF1 and HOTA metrics and comparable results on the MOTA metric.
Score: 1.2891210250935146
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer networks have been a focus of research in many fields in recent years, being able to surpass the state-of-the-art performance in different computer vision tasks. A few attempts have been made to apply this method to the task of Multiple Object Tracking (MOT), among those the state-of-the-art was TransCenter, a transformer-based MOT architecture with dense object queries for accurately tracking all the objects while keeping reasonable runtime. TransCenter is the first center-based transformer framework for MOT, and is also among the first to show the benefits of using transformer-based architectures for MOT. In this paper we show an improvement to this tracker using post processing mechanism based in the Track-by-Detection paradigm: motion model estimation using Kalman filter and target Re-identification using an embedding network. Our new tracker shows significant improvements in the IDF1 and HOTA metrics and comparable results on the MOTA metric (70.9%, 59.8% and 75.8% respectively) on the MOTChallenge MOT17 test dataset and improvement on all 3 metrics (67.5%, 56.3% and 73.0%) on the MOT20 test dataset. Our tracker is currently ranked first among transformer-based trackers in these datasets. The code is publicly available at: https://github.com/amitgalor18/STC_Tracker

Related papers

OneTrack-M: A multitask approach to transformer-based MOT models [0.0]
Multi-Object Tracking (MOT) is a critical problem in computer vision. OneTrack-M is a transformer-based MOT model designed to enhance tracking computational efficiency and accuracy.
arXiv Detail & Related papers (2025-02-06T20:02:06Z)
Heterogeneous Graph Transformer for Multiple Tiny Object Tracking in RGB-T Videos [31.910202172609313]
Existing multi-object tracking algorithms generally focus on single-modality scenes. We propose a novel framework called HGT-Track (Heterogeneous Graph Transformer based Multi-Tiny-Object Tracking) This paper introduces the first benchmark VT-Tiny-MOT (Visible-Thermal Tiny Multi-Object Tracking) for RGB-T fused multiple tiny object tracking.
arXiv Detail & Related papers (2024-12-14T15:17:49Z)
CT-MVSNet: Efficient Multi-View Stereo with Cross-scale Transformer [8.962657021133925]
Cross-scale transformer (CT) processes feature representations at different stages without additional computation. We introduce an adaptive matching-aware transformer (AMT) that employs different interactive attention combinations at multiple scales. We also present a dual-feature guided aggregation (DFGA) that embeds the coarse global semantic information into the finer cost volume construction.
arXiv Detail & Related papers (2023-12-14T01:33:18Z)
Separable Self and Mixed Attention Transformers for Efficient Object Tracking [3.9160947065896803]
This paper proposes an efficient self and mixed attention transformer-based architecture for lightweight tracking. With these contributions, the proposed lightweight tracker deploys a transformer-based backbone and head module concurrently for the first time. Simulations show that our Separable Self and Mixed Attention-based Tracker, SMAT, surpasses the performance of related lightweight trackers on GOT10k, TrackingNet, LaSOT, NfS30, UAV123, and AVisT datasets.
arXiv Detail & Related papers (2023-09-07T19:23:02Z)
MotionTrack: End-to-End Transformer-based Multi-Object Tracing with LiDAR-Camera Fusion [13.125168307241765]
We propose an end-to-end transformer-based MOT algorithm (MotionTrack) with multi-modality sensor inputs to track objects with multiple classes. The MotionTrack and its variations achieve better results (AMOTA score at 0.55) on the nuScenes dataset compared with other classical baseline models.
arXiv Detail & Related papers (2023-06-29T15:00:12Z)
Efficient Joint Detection and Multiple Object Tracking with Spatially Aware Transformer [0.8808021343665321]
We propose a light-weight and highly efficient Joint Detection and Tracking pipeline for the task of Multi-Object Tracking. It is driven by a transformer based backbone instead of CNN, which is highly scalable with the input resolution. As a result of our modifications, we reduce the overall model size of TransTrack by 58.73% and the complexity by 78.72%.
arXiv Detail & Related papers (2022-11-09T07:19:33Z)
End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time. Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z)
TransFiner: A Full-Scale Refinement Approach for Multiple Object Tracking [17.784388121222392]
Multiple object tracking (MOT) is the task containing detection and association. We propose TransFiner, a transformer-based post-refinement approach for MOT.
arXiv Detail & Related papers (2022-07-26T15:21:42Z)
Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects. The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z)
Unified Transformer Tracker for Object Tracking [58.65901124158068]
We present the Unified Transformer Tracker (UTT) to address tracking problems in different scenarios with one paradigm. A track transformer is developed in our UTT to track the target in both Single Object Tracking (SOT) and Multiple Object Tracking (MOT)
arXiv Detail & Related papers (2022-03-29T01:38:49Z)
Global Tracking Transformers [76.58184022651596]
We present a novel transformer-based architecture for global multi-object tracking. The core component is a global tracking transformer that operates on objects from all frames in the sequence. Our framework seamlessly integrates into state-of-the-art large-vocabulary detectors to track any objects.
arXiv Detail & Related papers (2022-03-24T17:58:04Z)
Efficient Visual Tracking with Exemplar Transformers [98.62550635320514]
We introduce the Exemplar Transformer, an efficient transformer for real-time visual object tracking. E.T.Track, our visual tracker that incorporates Exemplar Transformer layers, runs at 47 fps on a CPU. This is up to 8 times faster than other transformer-based models.
arXiv Detail & Related papers (2021-12-17T18:57:54Z)
TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking [74.82415271960315]
We propose a solution named TransMOT to efficiently model the spatial and temporal interactions among objects in a video. TransMOT is not only more computationally efficient than the traditional Transformer, but it also achieves better tracking accuracy. The proposed method is evaluated on multiple benchmark datasets including MOT15, MOT16, MOT17, and MOT20.
arXiv Detail & Related papers (2021-04-01T01:49:05Z)
Transformer Tracking [76.96796612225295]
Correlation acts as a critical role in the tracking field, especially in popular Siamese-based trackers. This work presents a novel attention-based feature fusion network, which effectively combines the template and search region features solely using attention. Experiments show that our TransT achieves very promising results on six challenging datasets.
arXiv Detail & Related papers (2021-03-29T09:06:55Z)
TransCenter: Transformers with Dense Queries for Multiple-Object Tracking [87.75122600164167]
We argue that the standard representation -- bounding boxes -- is not adapted to learning transformers for multiple-object tracking. We propose TransCenter, the first transformer-based architecture for tracking the centers of multiple targets.
arXiv Detail & Related papers (2021-03-28T14:49:36Z)
Tracking Objects as Points [83.9217787335878]
We present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. Our tracker, CenterTrack, applies a detection model to a pair of images and detections from the prior frame. CenterTrack is simple, online (no peeking into the future), and real-time.
arXiv Detail & Related papers (2020-04-02T17:58:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.