Parametric Depth Based Feature Representation Learning for Object
Detection and Segmentation in Bird's Eye View
- URL: http://arxiv.org/abs/2307.04106v2
- Date: Tue, 11 Jul 2023 23:55:53 GMT
- Title: Parametric Depth Based Feature Representation Learning for Object
Detection and Segmentation in Bird's Eye View
- Authors: Jiayu Yang, Enze Xie, Miaomiao Liu, Jose M. Alvarez
- Abstract summary: This paper focuses on leveraging geometry information, such as depth, to model such feature transformation.
We first lift the 2D image features to the 3D space defined for the ego vehicle via a predicted parametric depth distribution for each pixel in each view.
We then aggregate the 3D feature volume based on the 3D space occupancy derived from depth to the BEV frame.
- Score: 44.78243406441798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent vision-only perception models for autonomous driving achieved
promising results by encoding multi-view image features into Bird's-Eye-View
(BEV) space. A critical step and the main bottleneck of these methods is
transforming image features into the BEV coordinate frame. This paper focuses
on leveraging geometry information, such as depth, to model such feature
transformation. Existing works rely on non-parametric depth distribution
modeling leading to significant memory consumption, or ignore the geometry
information to address this problem. In contrast, we propose to use parametric
depth distribution modeling for feature transformation. We first lift the 2D
image features to the 3D space defined for the ego vehicle via a predicted
parametric depth distribution for each pixel in each view. Then, we aggregate
the 3D feature volume based on the 3D space occupancy derived from depth to the
BEV frame. Finally, we use the transformed features for downstream tasks such
as object detection and semantic segmentation. Existing semantic segmentation
methods do also suffer from an hallucination problem as they do not take
visibility information into account. This hallucination can be particularly
problematic for subsequent modules such as control and planning. To mitigate
the issue, our method provides depth uncertainty and reliable visibility-aware
estimations. We further leverage our parametric depth modeling to present a
novel visibility-aware evaluation metric that, when taken into account, can
mitigate the hallucination problem. Extensive experiments on object detection
and semantic segmentation on the nuScenes datasets demonstrate that our method
outperforms existing methods on both tasks.
Related papers
- Divide and Conquer: Improving Multi-Camera 3D Perception with 2D Semantic-Depth Priors and Input-Dependent Queries [30.17281824826716]
Existing techniques often neglect the synergistic effects of semantic and depth cues, leading to classification and position estimation errors.
We propose an input-aware Transformer framework that leverages Semantics and Depth as priors.
Our approach involves the use of an S-D that explicitly models semantic and depth priors, thereby disentangling the learning process of object categorization and position estimation.
arXiv Detail & Related papers (2024-08-13T13:51:34Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.
Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Semantic Validation in Structure from Motion [0.0]
Structure from Motion (SfM) is the process of recovering the 3D structure of a scene from a series of projective measurements.
SfM consists of three main steps; feature detection and matching, camera motion estimation, and recovery of 3D structure.
This project offers a novel method for improved validation of 3D SfM models.
arXiv Detail & Related papers (2023-04-05T12:58:59Z) - Towards Domain Generalization for Multi-view 3D Object Detection in
Bird-Eye-View [11.958753088613637]
We first analyze the causes of the domain gap for the MV3D-Det task.
To acquire a robust depth prediction, we propose to decouple the depth estimation from intrinsic parameters of the camera.
We modify the focal length values to create multiple pseudo-domains and construct an adversarial training loss to encourage the feature representation to be more domain-agnostic.
arXiv Detail & Related papers (2023-03-03T02:59:13Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images
with Virtual Depth [64.29043589521308]
We propose a rendering module to augment the training data by synthesizing images with virtual-depths.
The rendering module takes as input the RGB image and its corresponding sparse depth image, outputs a variety of photo-realistic synthetic images.
Besides, we introduce an auxiliary module to improve the detection model by jointly optimizing it through a depth estimation task.
arXiv Detail & Related papers (2021-07-28T11:00:47Z) - Geometry-aware data augmentation for monocular 3D object detection [18.67567745336633]
This paper focuses on monocular 3D object detection, one of the essential modules in autonomous driving systems.
A key challenge is that the depth recovery problem is ill-posed in monocular data.
We conduct a thorough analysis to reveal how existing methods fail to robustly estimate depth when different geometry shifts occur.
We convert the aforementioned manipulations into four corresponding 3D-aware data augmentation techniques.
arXiv Detail & Related papers (2021-04-12T23:12:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.