EarlyBird: Early-Fusion for Multi-View Tracking in the Bird's Eye View
- URL: http://arxiv.org/abs/2310.13350v1
- Date: Fri, 20 Oct 2023 08:27:21 GMT
- Title: EarlyBird: Early-Fusion for Multi-View Tracking in the Bird's Eye View
- Authors: Torben Teepe, Philipp Wolters, Johannes Gilg, Fabian Herzog, Gerhard
Rigoll
- Abstract summary: We show that early-fusion in the Bird's Eye View achieves high accuracy for both detection and tracking.
EarlyBird outperforms the state-of-the-art methods and improves the current state-of-the-art on Wildtrack by +4.6 MOTA and +5.6 IDF1.
- Score: 6.093524345727119
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-view aggregation promises to overcome the occlusion and missed
detection challenge in multi-object detection and tracking. Recent approaches
in multi-view detection and 3D object detection made a huge performance leap by
projecting all views to the ground plane and performing the detection in the
Bird's Eye View (BEV). In this paper, we investigate if tracking in the BEV can
also bring the next performance breakthrough in Multi-Target Multi-Camera
(MTMC) tracking. Most current approaches in multi-view tracking perform the
detection and tracking task in each view and use graph-based approaches to
perform the association of the pedestrian across each view. This spatial
association is already solved by detecting each pedestrian once in the BEV,
leaving only the problem of temporal association. For the temporal association,
we show how to learn strong Re-Identification (re-ID) features for each
detection. The results show that early-fusion in the BEV achieves high accuracy
for both detection and tracking. EarlyBird outperforms the state-of-the-art
methods and improves the current state-of-the-art on Wildtrack by +4.6 MOTA and
+5.6 IDF1.
Related papers
- Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking [15.533652456081374]
Multi-object tracking (MOT) endeavors to precisely estimate identities and positions of multiple objects over time.
Modern detectors may occasionally miss some objects in certain frames, causing trackers to cease tracking prematurely.
We propose BUSCA, meaning to search', a versatile framework compatible with any online TbD system.
arXiv Detail & Related papers (2024-07-14T10:45:12Z) - ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association [15.161640917854363]
We introduce ADA-Track, a novel end-to-end framework for 3D MOT from multi-view cameras.
We introduce a learnable data association module based on edge-augmented cross-attention.
We integrate this association module into the decoder layer of a DETR-based 3D detector.
arXiv Detail & Related papers (2024-05-14T19:02:33Z) - Lifting Multi-View Detection and Tracking to the Bird's Eye View [5.679775668038154]
Recent advancements in multi-view detection and 3D object recognition have significantly improved performance.
We compare modern lifting methods, both parameter-free and parameterized, to multi-view aggregation.
We present an architecture that aggregates the features of multiple times steps to learn robust detection.
arXiv Detail & Related papers (2024-03-19T09:33:07Z) - DIVOTrack: A Novel Dataset and Baseline Method for Cross-View
Multi-Object Tracking in DIVerse Open Scenes [74.64897845999677]
We introduce a new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians.
Our DIVOTrack has fifteen distinct scenarios and 953 cross-view tracks, surpassing all cross-view multi-object tracking datasets currently available.
Furthermore, we provide a novel baseline cross-view tracking method with a unified joint detection and cross-view tracking framework named CrossMOT.
arXiv Detail & Related papers (2023-02-15T14:10:42Z) - Multi-view Tracking Using Weakly Supervised Human Motion Prediction [60.972708589814125]
We argue that an even more effective approach is to predict people motion over time and infer people's presence in individual frames from these.
This enables to enforce consistency both over time and across views of a single temporal frame.
We validate our approach on the PETS2009 and WILDTRACK datasets and demonstrate that it outperforms state-of-the-art methods.
arXiv Detail & Related papers (2022-10-19T17:58:23Z) - Minkowski Tracker: A Sparse Spatio-Temporal R-CNN for Joint Object
Detection and Tracking [53.64390261936975]
We present Minkowski Tracker, a sparse-temporal R-CNN that jointly solves object detection and tracking problems.
Inspired by region-based CNN (R-CNN), we propose to track motion as a second stage of the object detector R-CNN.
We show in large-scale experiments that the overall performance gain of our method is due to four factors.
arXiv Detail & Related papers (2022-08-22T04:47:40Z) - BEVerse: Unified Perception and Prediction in Birds-Eye-View for
Vision-Centric Autonomous Driving [92.05963633802979]
We present BEVerse, a unified framework for 3D perception and prediction based on multi-camera systems.
We show that the multi-task BEVerse outperforms single-task methods on 3D object detection, semantic map construction, and motion prediction.
arXiv Detail & Related papers (2022-05-19T17:55:35Z) - Dense Scene Multiple Object Tracking with Box-Plane Matching [73.54369833671772]
Multiple Object Tracking (MOT) is an important task in computer vision.
We propose the Box-Plane Matching (BPM) method to improve the MOT performacne in dense scenes.
With the effectiveness of the three modules, our team achieves the 1st place on the Track-1 leaderboard in the ACM MM Grand Challenge HiEve 2020.
arXiv Detail & Related papers (2020-07-30T16:39:22Z) - Tracking-by-Counting: Using Network Flows on Crowd Density Maps for
Tracking Multiple Targets [96.98888948518815]
State-of-the-art multi-object tracking(MOT) methods follow the tracking-by-detection paradigm.
We propose a new MOT paradigm, tracking-by-counting, tailored for crowded scenes.
arXiv Detail & Related papers (2020-07-18T19:51:53Z) - End-to-End Multi-Object Tracking with Global Response Map [23.755882375664875]
We present a completely end-to-end approach that takes image-sequence/video as input and outputs directly the located and tracked objects of learned types.
Specifically, with our introduced multi-object representation strategy, a global response map can be accurately generated over frames.
Experimental results based on the MOT16 and MOT17 benchmarks show that our proposed on-line tracker achieved state-of-the-art performance on several tracking metrics.
arXiv Detail & Related papers (2020-07-13T12:30:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.