STS: Surround-view Temporal Stereo for Multi-view 3D Detection
- URL: http://arxiv.org/abs/2208.10145v1
- Date: Mon, 22 Aug 2022 08:46:33 GMT
- Title: STS: Surround-view Temporal Stereo for Multi-view 3D Detection
- Authors: Zengran Wang, Chen Min, Zheng Ge, Yinhao Li, Zeming Li, Hongyu Yang,
Di Huang
- Abstract summary: We propose a novel Surround-view Temporal Stereo (STS) technique that leverages the geometry correspondence between frames across time to facilitate accurate depth learning.
Experiments on nuScenes show that STS greatly boosts 3D detection ability, notably for medium and long distance objects.
- Score: 28.137180365082976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning accurate depth is essential to multi-view 3D object detection.
Recent approaches mainly learn depth from monocular images, which confront
inherent difficulties due to the ill-posed nature of monocular depth learning.
Instead of using a sole monocular depth method, in this work, we propose a
novel Surround-view Temporal Stereo (STS) technique that leverages the geometry
correspondence between frames across time to facilitate accurate depth
learning. Specifically, we regard the field of views from all cameras around
the ego vehicle as a unified view, namely surroundview, and conduct temporal
stereo matching on it. The resulting geometrical correspondence between
different frames from STS is utilized and combined with the monocular depth to
yield final depth prediction. Comprehensive experiments on nuScenes show that
STS greatly boosts 3D detection ability, notably for medium and long distance
objects. On BEVDepth with ResNet-50 backbone, STS improves mAP and NDS by 2.6%
and 1.4%, respectively. Consistent improvements are observed when using a
larger backbone and a larger image resolution, demonstrating its effectiveness
Related papers
- Geometry-aware Temporal Aggregation Network for Monocular 3D Lane Detection [62.27919334393825]
We propose a novel Geometry-aware Temporal Aggregation Network (GTA-Net) for monocular 3D lane detection.
On one hand, we develop the Temporal Geometry Enhancement Module (TGEM), which exploits geometric consistency across successive frames.
On the other hand, we present the Temporal Instance-aware Query Generation (TIQG), which strategically incorporates temporal cues into query generation.
arXiv Detail & Related papers (2025-04-29T08:10:17Z) - SDGE: Stereo Guided Depth Estimation for 360$^\circ$ Camera Sets [65.64958606221069]
Multi-camera systems are often used in autonomous driving to achieve a 360$circ$ perception.
These 360$circ$ camera sets often have limited or low-quality overlap regions, making multi-view stereo methods infeasible for the entire image.
We propose the Stereo Guided Depth Estimation (SGDE) method, which enhances depth estimation of the full image by explicitly utilizing multi-view stereo results on the overlap.
arXiv Detail & Related papers (2024-02-19T02:41:37Z) - DepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for
Monocular 3D Semantic Scene Completion [0.4662017507844857]
DepthSSC is an advanced method for semantic scene completion solely based on monocular cameras.
It mitigates spatial misalignment and distortion issues observed in prior methods.
It demonstrates its effectiveness in capturing intricate 3D structural details and achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-11-28T01:47:51Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo [103.08512487830669]
We present a modern solution to the multi-view photometric stereo problem (MVPS)
We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the object's surface geometry.
Our method performs neural rendering of multi-view images while utilizing surface normals estimated by a deep photometric stereo network.
arXiv Detail & Related papers (2021-10-11T20:20:03Z) - Panoramic Depth Estimation via Supervised and Unsupervised Learning in
Indoor Scenes [8.48364407942494]
We introduce panoramic images to obtain larger field of view.
We improve the training process of the neural network adapted to the characteristics of panoramic images.
With a comprehensive variety of experiments, this research demonstrates the effectiveness of our schemes aiming for indoor scene perception.
arXiv Detail & Related papers (2021-08-18T09:58:44Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - MonoGRNet: A General Framework for Monocular 3D Object Detection [23.59839921644492]
We propose MonoGRNet for the amodal 3D object detection from a monocular image via geometric reasoning.
MonoGRNet decomposes the monocular 3D object detection task into four sub-tasks including 2D object detection, instance-level depth estimation, projected 3D center estimation and local corner regression.
Experiments are conducted on KITTI, Cityscapes and MS COCO datasets.
arXiv Detail & Related papers (2021-04-18T10:07:52Z) - Geometry-aware data augmentation for monocular 3D object detection [18.67567745336633]
This paper focuses on monocular 3D object detection, one of the essential modules in autonomous driving systems.
A key challenge is that the depth recovery problem is ill-posed in monocular data.
We conduct a thorough analysis to reveal how existing methods fail to robustly estimate depth when different geometry shifts occur.
We convert the aforementioned manipulations into four corresponding 3D-aware data augmentation techniques.
arXiv Detail & Related papers (2021-04-12T23:12:48Z) - M3DSSD: Monocular 3D Single Stage Object Detector [82.25793227026443]
We propose a Monocular 3D Single Stage object Detector (M3DSSD) with feature alignment and asymmetric non-local attention.
The proposed M3DSSD achieves significantly better performance than the monocular 3D object detection methods on the KITTI dataset.
arXiv Detail & Related papers (2021-03-24T13:09:11Z) - Self-supervised monocular depth estimation from oblique UAV videos [8.876469413317341]
This paper aims to estimate depth from a single UAV aerial image using deep learning.
We propose a novel architecture with two 2D CNN encoders and a 3D CNN decoder for extracting information from consecutive temporal frames.
arXiv Detail & Related papers (2020-12-19T14:53:28Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.