BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection
via Dynamic Temporal Stereo
- URL: http://arxiv.org/abs/2304.04185v1
- Date: Sun, 9 Apr 2023 08:04:26 GMT
- Title: BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection
via Dynamic Temporal Stereo
- Authors: Yinhao Li, Jinrong Yang, Jianjian Sun, Han Bao, Zheng Ge, Li Xiao
- Abstract summary: temporal multi-view stereo (MVS) technology is the natural knowledge for tackling this ambiguity.
By introducing a dynamic temporal stereo strategy, BEVStereo++ is able to cut down the harm that is brought by introducing temporal stereo.
BEVStereo++ achieves state-of-the-art(SOTA) on both dataset and nuScenes.
- Score: 6.5401888641091634
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bounded by the inherent ambiguity of depth perception, contemporary
multi-view 3D object detection methods fall into the performance bottleneck.
Intuitively, leveraging temporal multi-view stereo (MVS) technology is the
natural knowledge for tackling this ambiguity. However, traditional attempts of
MVS has two limitations when applying to 3D object detection scenes: 1) The
affinity measurement among all views suffers expensive computational cost; 2)
It is difficult to deal with outdoor scenarios where objects are often mobile.
To this end, we propose BEVStereo++: by introducing a dynamic temporal stereo
strategy, BEVStereo++ is able to cut down the harm that is brought by
introducing temporal stereo when dealing with those two scenarios. Going one
step further, we apply Motion Compensation Module and long sequence Frame
Fusion to BEVStereo++, which shows further performance boosting and error
reduction. Without bells and whistles, BEVStereo++ achieves
state-of-the-art(SOTA) on both Waymo and nuScenes dataset.
Related papers
- Shape of Motion: 4D Reconstruction from a Single Video [51.04575075620677]
We introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion.
We exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases.
Our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.
arXiv Detail & Related papers (2024-07-18T17:59:08Z) - Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion [57.232688209606515]
We present HTCL, a novel Temporal Temporal Context Learning paradigm for improving camera-based semantic scene completion.
Our method ranks $1st$ on the Semantic KITTI benchmark and even surpasses LiDAR-based methods in terms of mIoU.
arXiv Detail & Related papers (2024-07-02T09:11:17Z) - Decoupling Dynamic Monocular Videos for Dynamic View Synthesis [50.93409250217699]
We tackle the challenge of dynamic view synthesis from dynamic monocular videos in an unsupervised fashion.
Specifically, we decouple the motion of the dynamic objects into object motion and camera motion, respectively regularized by proposed unsupervised surface consistency and patch-based multi-view constraints.
arXiv Detail & Related papers (2023-04-04T11:25:44Z) - DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object
Detection and Tracking [67.34803048690428]
We propose to model Dynamic Objects in RecurrenT (DORT) to tackle this problem.
DORT extracts object-wise local volumes for motion estimation that also alleviates the heavy computational burden.
It is flexible and practical that can be plugged into most camera-based 3D object detectors.
arXiv Detail & Related papers (2023-03-29T12:33:55Z) - BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection
with Dynamic Temporal Stereo [15.479670314689418]
We introduce an effective temporal stereo method to dynamically select the scale of matching candidates.
We design an iterative algorithm to update more valuable candidates, making it adaptive to moving candidates.
BEVStereo achieves the new state-of-the-art performance on the camera-only track of nuScenes dataset.
arXiv Detail & Related papers (2022-09-21T10:21:25Z) - ORA3D: Overlap Region Aware Multi-view 3D Object Detection [11.58746596768273]
Current multi-view 3D object detection methods often fail to detect objects in the overlap region properly.
We propose using the following two main modules: (1) Stereo Disparity Estimation for Weak Depth Supervision and (2) Adrial Overlap Region Discriversaminator.
Our proposed method outperforms current state-of-the-art models, i.e., DETR3D and BEVDet.
arXiv Detail & Related papers (2022-07-02T15:28:44Z) - Implicit Motion Handling for Video Camouflaged Object Detection [60.98467179649398]
We propose a new video camouflaged object detection (VCOD) framework.
It can exploit both short-term and long-term temporal consistency to detect camouflaged objects from video frames.
arXiv Detail & Related papers (2022-03-14T17:55:41Z) - Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for
Temporal Sentence Grounding [61.57847727651068]
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.
Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information.
We propose a novel Motion- and Appearance-guided 3D Semantic Reasoning Network (MA3SRN), which incorporates optical-flow-guided motion-aware, detection-based appearance-aware, and 3D-aware object-level features.
arXiv Detail & Related papers (2022-03-06T13:57:09Z) - Multi-view Monocular Depth and Uncertainty Prediction with Deep SfM in
Dynamic Environments [0.2426580753117204]
3D reconstruction of depth and motion from monocular video in dynamic environments is a highly ill-posed problem.
We investigate the performance of the current State-of-the-Art (SotA) deep multi-view systems in such environments.
arXiv Detail & Related papers (2022-01-21T10:42:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.