DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object
  Detection and Tracking
        - URL: http://arxiv.org/abs/2303.16628v2
- Date: Wed, 19 Apr 2023 01:58:41 GMT
- Title: DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object
  Detection and Tracking
- Authors: Qing Lian, Tai Wang, Dahua Lin, Jiangmiao Pang
- Abstract summary: We propose to model Dynamic Objects in RecurrenT (DORT) to tackle this problem.
DORT extracts object-wise local volumes for motion estimation that also alleviates the heavy computational burden.
It is flexible and practical that can be plugged into most camera-based 3D object detectors.
- Score: 67.34803048690428
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract:   Recent multi-camera 3D object detectors usually leverage temporal information
to construct multi-view stereo that alleviates the ill-posed depth estimation.
However, they typically assume all the objects are static and directly
aggregate features across frames. This work begins with a theoretical and
empirical analysis to reveal that ignoring the motion of moving objects can
result in serious localization bias. Therefore, we propose to model Dynamic
Objects in RecurrenT (DORT) to tackle this problem. In contrast to previous
global Bird-Eye-View (BEV) methods, DORT extracts object-wise local volumes for
motion estimation that also alleviates the heavy computational burden. By
iteratively refining the estimated object motion and location, the preceding
features can be precisely aggregated to the current frame to mitigate the
aforementioned adverse effects. The simple framework has two significant
appealing properties. It is flexible and practical that can be plugged into
most camera-based 3D object detectors. As there are predictions of object
motion in the loop, it can easily track objects across frames according to
their nearest center distances. Without bells and whistles, DORT outperforms
all the previous methods on the nuScenes detection and tracking benchmarks with
62.5\% NDS and 57.6\% AMOTA, respectively. The source code will be released.
 
      
        Related papers
        - Street Gaussians without 3D Object Tracker [86.62329193275916]
 Existing methods rely on labor-intensive manual labeling of object poses to reconstruct dynamic objects in canonical space.
We propose a stable object tracking module by leveraging associations from 2D deep trackers within a 3D object fusion strategy.
We address inevitable tracking errors by further introducing a motion learning strategy in an implicit feature space that autonomously corrects trajectory errors and recovers missed detections.
 arXiv  Detail & Related papers  (2024-12-07T05:49:42Z)
- Delving into Motion-Aware Matching for Monocular 3D Object Tracking [81.68608983602581]
 We find that the motion cue of objects along different time frames is critical in 3D multi-object tracking.
We propose MoMA-M3T, a framework that mainly consists of three motion-aware components.
We conduct extensive experiments on the nuScenes and KITTI datasets to demonstrate our MoMA-M3T achieves competitive performance against state-of-the-art methods.
 arXiv  Detail & Related papers  (2023-08-22T17:53:58Z)
- BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection
  with Dynamic Temporal Stereo [15.479670314689418]
 We introduce an effective temporal stereo method to dynamically select the scale of matching candidates.
We design an iterative algorithm to update more valuable candidates, making it adaptive to moving candidates.
BEVStereo achieves the new state-of-the-art performance on the camera-only track of nuScenes dataset.
 arXiv  Detail & Related papers  (2022-09-21T10:21:25Z)
- TwistSLAM++: Fusing multiple modalities for accurate dynamic semantic
  SLAM [0.0]
 TwistSLAM++ is a semantic, dynamic, SLAM system that fuses stereo images and LiDAR information.
We show on classical benchmarks that this fusion approach based on multimodal information improves the accuracy of object tracking.
 arXiv  Detail & Related papers  (2022-09-16T12:28:21Z)
- Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
 We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
 arXiv  Detail & Related papers  (2022-07-26T15:48:46Z)
- Objects are Different: Flexible Monocular 3D Object Detection [87.82253067302561]
 We propose a flexible framework for monocular 3D object detection which explicitly decouples the truncated objects and adaptively combines multiple approaches for object depth estimation.
 Experiments demonstrate that our method outperforms the state-of-the-art method by relatively 27% for the moderate level and 30% for the hard level in the test set of KITTI benchmark.
 arXiv  Detail & Related papers  (2021-04-06T07:01:28Z)
- Monocular Quasi-Dense 3D Object Tracking [99.51683944057191]
 A reliable and accurate 3D tracking framework is essential for predicting future locations of surrounding objects and planning the observer's actions in numerous applications such as autonomous driving.
We propose a framework that can effectively associate moving objects over time and estimate their full 3D bounding box information from a sequence of 2D images captured on a moving platform.
 arXiv  Detail & Related papers  (2021-03-12T15:30:02Z)
- Detecting Invisible People [58.49425715635312]
 We re-purpose tracking benchmarks and propose new metrics for the task of detecting invisible objects.
We demonstrate that current detection and tracking systems perform dramatically worse on this task.
Second, we build dynamic models that explicitly reason in 3D, making use of observations produced by state-of-the-art monocular depth estimation networks.
 arXiv  Detail & Related papers  (2020-12-15T16:54:45Z)
- e-TLD: Event-based Framework for Dynamic Object Tracking [23.026432675020683]
 This paper presents a long-term object tracking framework with a moving event camera under general tracking conditions.
The framework uses a discriminative representation for the object with online learning, and detects and re-tracks the object when it comes back into the field-of-view.
 arXiv  Detail & Related papers  (2020-09-02T07:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.