Related papers: Lifting Multi-View Detection and Tracking to the Bird's Eye View

Lifting Multi-View Detection and Tracking to the Bird's Eye View

URL: http://arxiv.org/abs/2403.12573v1
Date: Tue, 19 Mar 2024 09:33:07 GMT
Title: Lifting Multi-View Detection and Tracking to the Bird's Eye View
Authors: Torben Teepe, Philipp Wolters, Johannes Gilg, Fabian Herzog, Gerhard Rigoll,
Abstract summary: Recent advancements in multi-view detection and 3D object recognition have significantly improved performance. We compare modern lifting methods, both parameter-free and parameterized, to multi-view aggregation. We present an architecture that aggregates the features of multiple times steps to learn robust detection.
Score: 5.679775668038154
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Taking advantage of multi-view aggregation presents a promising solution to tackle challenges such as occlusion and missed detection in multi-object tracking and detection. Recent advancements in multi-view detection and 3D object recognition have significantly improved performance by strategically projecting all views onto the ground plane and conducting detection analysis from a Bird's Eye View. In this paper, we compare modern lifting methods, both parameter-free and parameterized, to multi-view aggregation. Additionally, we present an architecture that aggregates the features of multiple times steps to learn robust detection and combines appearance- and motion-based cues for tracking. Most current tracking approaches either focus on pedestrians or vehicles. In our work, we combine both branches and add new challenges to multi-view detection with cross-scene setups. Our method generalizes to three public datasets across two domains: (1) pedestrian: Wildtrack and MultiviewX, and (2) roadside perception: Synthehicle, achieving state-of-the-art performance in detection and tracking. https://github.com/tteepe/TrackTacular

Related papers

Attention-Aware Multi-View Pedestrian Tracking [21.393389135740712]
Recent multi-view pedestrian detection models have highlighted the potential of an early-fusion strategy. This strategy has been shown to improve both detection and tracking performance. We propose a novel model that incorporates attention mechanisms in a multi-view pedestrian tracking scenario.
arXiv Detail & Related papers (2025-04-03T21:53:08Z)
HSTrack: Bootstrap End-to-End Multi-Camera 3D Multi-object Tracking with Hybrid Supervision [34.7347336548199]
In camera-based 3D multi-object tracking (MOT), the prevailing methods follow the tracking-by-query-propagation paradigm. We present HSTrack, a novel plug-and-play method designed to co-facilitate multi-task learning for detection and tracking.
arXiv Detail & Related papers (2024-11-11T08:18:49Z)
ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association [15.161640917854363]
We introduce ADA-Track, a novel end-to-end framework for 3D MOT from multi-view cameras. We introduce a learnable data association module based on edge-augmented cross-attention. We integrate this association module into the decoder layer of a DETR-based 3D detector.
arXiv Detail & Related papers (2024-05-14T19:02:33Z)
EarlyBird: Early-Fusion for Multi-View Tracking in the Bird's Eye View [6.093524345727119]
We show that early-fusion in the Bird's Eye View achieves high accuracy for both detection and tracking. EarlyBird outperforms the state-of-the-art methods and improves the current state-of-the-art on Wildtrack by +4.6 MOTA and +5.6 IDF1.
arXiv Detail & Related papers (2023-10-20T08:27:21Z)
DIVOTrack: A Novel Dataset and Baseline Method for Cross-View Multi-Object Tracking in DIVerse Open Scenes [74.64897845999677]
We introduce a new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians. Our DIVOTrack has fifteen distinct scenarios and 953 cross-view tracks, surpassing all cross-view multi-object tracking datasets currently available. Furthermore, we provide a novel baseline cross-view tracking method with a unified joint detection and cross-view tracking framework named CrossMOT.
arXiv Detail & Related papers (2023-02-15T14:10:42Z)
3DMODT: Attention-Guided Affinities for Joint Detection & Tracking in 3D Point Clouds [95.54285993019843]
We propose a method for joint detection and tracking of multiple objects in 3D point clouds. Our model exploits temporal information employing multiple frames to detect objects and track them in a single network.
arXiv Detail & Related papers (2022-11-01T20:59:38Z)
Track to Detect and Segment: An Online Multi-Object Tracker [81.15608245513208]
TraDeS is an online joint detection and tracking model, exploiting tracking clues to assist detection end-to-end. TraDeS infers object tracking offset by a cost volume, which is used to propagate previous object features.
arXiv Detail & Related papers (2021-03-16T02:34:06Z)
Probabilistic 3D Multi-Modal, Multi-Object Tracking for Autonomous Driving [22.693895321632507]
We propose a probabilistic, multi-modal, multi-object tracking system consisting of different trainable modules. We show that our method outperforms current state-of-the-art on the NuScenes Tracking dataset.
arXiv Detail & Related papers (2020-12-26T15:00:54Z)
Dense Scene Multiple Object Tracking with Box-Plane Matching [73.54369833671772]
Multiple Object Tracking (MOT) is an important task in computer vision. We propose the Box-Plane Matching (BPM) method to improve the MOT performacne in dense scenes. With the effectiveness of the three modules, our team achieves the 1st place on the Track-1 leaderboard in the ACM MM Grand Challenge HiEve 2020.
arXiv Detail & Related papers (2020-07-30T16:39:22Z)
Tracking-by-Counting: Using Network Flows on Crowd Density Maps for Tracking Multiple Targets [96.98888948518815]
State-of-the-art multi-object tracking(MOT) methods follow the tracking-by-detection paradigm. We propose a new MOT paradigm, tracking-by-counting, tailored for crowded scenes.
arXiv Detail & Related papers (2020-07-18T19:51:53Z)
Multiview Detection with Feature Perspective Transformation [59.34619548026885]
We propose a novel multiview detection system, MVDet. We take an anchor-free approach to aggregate multiview information by projecting feature maps onto the ground plane. Our entire model is end-to-end learnable and achieves 88.2% MODA on the standard Wildtrack dataset.
arXiv Detail & Related papers (2020-07-14T17:58:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.