Related papers: ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association

ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association

URL: http://arxiv.org/abs/2405.08909v1
Date: Tue, 14 May 2024 19:02:33 GMT
Title: ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association
Authors: Shuxiao Ding, Lukas Schneider, Marius Cordts, Juergen Gall,
Abstract summary: We introduce ADA-Track, a novel end-to-end framework for 3D MOT from multi-view cameras. We introduce a learnable data association module based on edge-augmented cross-attention. We integrate this association module into the decoder layer of a DETR-based 3D detector.
Score: 15.161640917854363
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Many query-based approaches for 3D Multi-Object Tracking (MOT) adopt the tracking-by-attention paradigm, utilizing track queries for identity-consistent detection and object queries for identity-agnostic track spawning. Tracking-by-attention, however, entangles detection and tracking queries in one embedding for both the detection and tracking task, which is sub-optimal. Other approaches resemble the tracking-by-detection paradigm, detecting objects using decoupled track and detection queries followed by a subsequent association. These methods, however, do not leverage synergies between the detection and association task. Combining the strengths of both paradigms, we introduce ADA-Track, a novel end-to-end framework for 3D MOT from multi-view cameras. We introduce a learnable data association module based on edge-augmented cross-attention, leveraging appearance and geometric features. Furthermore, we integrate this association module into the decoder layer of a DETR-based 3D detector, enabling simultaneous DETR-like query-to-image cross-attention for detection and query-to-query cross-attention for data association. By stacking these decoder layers, queries are refined for the detection and association task alternately, effectively harnessing the task dependencies. We evaluate our method on the nuScenes dataset and demonstrate the advantage of our approach compared to the two previous paradigms. Code is available at https://github.com/dsx0511/ADA-Track.

Related papers

HSTrack: Bootstrap End-to-End Multi-Camera 3D Multi-object Tracking with Hybrid Supervision [34.7347336548199]
In camera-based 3D multi-object tracking (MOT), the prevailing methods follow the tracking-by-query-propagation paradigm. We present HSTrack, a novel plug-and-play method designed to co-facilitate multi-task learning for detection and tracking.
arXiv Detail & Related papers (2024-11-11T08:18:49Z)
Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking [15.533652456081374]
Multi-object tracking (MOT) endeavors to precisely estimate identities and positions of multiple objects over time. Modern detectors may occasionally miss some objects in certain frames, causing trackers to cease tracking prematurely. We propose BUSCA, meaning to search', a versatile framework compatible with any online TbD system.
arXiv Detail & Related papers (2024-07-14T10:45:12Z)
SeMoLi: What Moves Together Belongs Together [51.72754014130369]
We tackle semi-supervised object detection based on motion cues. Recent results suggest that motion-based clustering methods can be used to pseudo-label instances of moving objects. We re-think this approach and suggest that both, object detection, as well as motion-inspired pseudo-labeling, can be tackled in a data-driven manner.
arXiv Detail & Related papers (2024-02-29T18:54:53Z)
You Only Need Two Detectors to Achieve Multi-Modal 3D Multi-Object Tracking [9.20064374262956]
The proposed framework can achieve robust tracking by using only a 2D detector and a 3D detector. It is proven more accurate than many of the state-of-the-art TBD-based multi-modal tracking methods.
arXiv Detail & Related papers (2023-04-18T02:45:18Z)
DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes [74.64897845999677]
We introduce a new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians. Our DIVOTrack has fifteen distinct scenarios and 953 cross-view tracks, surpassing all cross-view multi-object tracking datasets currently available. Furthermore, we provide a novel baseline cross-view tracking method with a unified joint detection and cross-view tracking framework named CrossMOT.
arXiv Detail & Related papers (2023-02-15T14:10:42Z)
3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds. Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z)
End-to-end Tracking with a Multi-query Transformer [96.13468602635082]
Multiple-object tracking (MOT) is a challenging task that requires simultaneous reasoning about location, appearance, and identity of the objects in the scene over time. Our aim in this paper is to move beyond tracking-by-detection approaches, to class-agnostic tracking that performs well also for unknown object classes.
arXiv Detail & Related papers (2022-10-26T10:19:37Z)
Distractor-Aware Fast Tracking via Dynamic Convolutions and MOT Philosophy [63.91005999481061]
A practical long-term tracker typically contains three key properties, i.e. an efficient model design, an effective global re-detection strategy and a robust distractor awareness mechanism. We propose a two-task tracking frame work (named DMTrack) to achieve distractor-aware fast tracking via Dynamic convolutions (d-convs) and Multiple object tracking (MOT) philosophy. Our tracker achieves state-of-the-art performance on the LaSOT, OxUvA, TLP, VOT2018LT and VOT 2019LT benchmarks and runs in real-time (3x faster
arXiv Detail & Related papers (2021-04-25T00:59:53Z)
Track to Detect and Segment: An Online Multi-Object Tracker [81.15608245513208]
TraDeS is an online joint detection and tracking model, exploiting tracking clues to assist detection end-to-end. TraDeS infers object tracking offset by a cost volume, which is used to propagate previous object features.
arXiv Detail & Related papers (2021-03-16T02:34:06Z)
DEFT: Detection Embeddings for Tracking [3.326320568999945]
We propose an efficient joint detection and tracking model named DEFT. Our approach relies on an appearance-based object matching network jointly-learned with an underlying object detection network. DEFT has comparable accuracy and speed to the top methods on 2D online tracking leaderboards.
arXiv Detail & Related papers (2021-02-03T20:00:44Z)
Dense Scene Multiple Object Tracking with Box-Plane Matching [73.54369833671772]
Multiple Object Tracking (MOT) is an important task in computer vision. We propose the Box-Plane Matching (BPM) method to improve the MOT performacne in dense scenes. With the effectiveness of the three modules, our team achieves the 1st place on the Track-1 leaderboard in the ACM MM Grand Challenge HiEve 2020.
arXiv Detail & Related papers (2020-07-30T16:39:22Z)
Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking [102.31092931373232]
We propose a simple online model named Chained-Tracker (CTracker), which naturally integrates all the three subtasks into an end-to-end solution. The two major novelties: chained structure and paired attentive regression, make CTracker simple, fast and effective.
arXiv Detail & Related papers (2020-07-29T02:38:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.