Towards Domain Generalization for Multi-view 3D Object Detection in
Bird-Eye-View
- URL: http://arxiv.org/abs/2303.01686v1
- Date: Fri, 3 Mar 2023 02:59:13 GMT
- Title: Towards Domain Generalization for Multi-view 3D Object Detection in
Bird-Eye-View
- Authors: Shuo Wang, Xinhai Zhao, Hai-Ming Xu, Zehui Chen, Dameng Yu, Jiahao
Chang, Zhen Yang, Feng Zhao
- Abstract summary: We first analyze the causes of the domain gap for the MV3D-Det task.
To acquire a robust depth prediction, we propose to decouple the depth estimation from intrinsic parameters of the camera.
We modify the focal length values to create multiple pseudo-domains and construct an adversarial training loss to encourage the feature representation to be more domain-agnostic.
- Score: 11.958753088613637
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-view 3D object detection (MV3D-Det) in Bird-Eye-View (BEV) has drawn
extensive attention due to its low cost and high efficiency. Although new
algorithms for camera-only 3D object detection have been continuously proposed,
most of them may risk drastic performance degradation when the domain of input
images differs from that of training. In this paper, we first analyze the
causes of the domain gap for the MV3D-Det task. Based on the covariate shift
assumption, we find that the gap mainly attributes to the feature distribution
of BEV, which is determined by the quality of both depth estimation and 2D
image's feature representation. To acquire a robust depth prediction, we
propose to decouple the depth estimation from the intrinsic parameters of the
camera (i.e. the focal length) through converting the prediction of metric
depth to that of scale-invariant depth and perform dynamic perspective
augmentation to increase the diversity of the extrinsic parameters (i.e. the
camera poses) by utilizing homography. Moreover, we modify the focal length
values to create multiple pseudo-domains and construct an adversarial training
loss to encourage the feature representation to be more domain-agnostic.
Without bells and whistles, our approach, namely DG-BEV, successfully
alleviates the performance drop on the unseen target domain without impairing
the accuracy of the source domain. Extensive experiments on various public
datasets, including Waymo, nuScenes, and Lyft, demonstrate the generalization
and effectiveness of our approach. To the best of our knowledge, this is the
first systematic study to explore a domain generalization method for MV3D-Det.
Related papers
- Towards Unified 3D Object Detection via Algorithm and Data Unification [70.27631528933482]
We build the first unified multi-modal 3D object detection benchmark MM- Omni3D and extend the aforementioned monocular detector to its multi-modal version.
We name the designed monocular and multi-modal detectors as UniMODE and MM-UniMODE, respectively.
arXiv Detail & Related papers (2024-02-28T18:59:31Z) - Toward Accurate Camera-based 3D Object Detection via Cascade Depth
Estimation and Calibration [20.82054596017465]
Recent camera-based 3D object detection is limited by the precision of transforming from image to 3D feature spaces.
This paper aims to address such a fundamental problem of camera-based 3D object detection: How to effectively learn depth information for accurate feature lifting and object localization.
arXiv Detail & Related papers (2024-02-07T14:21:26Z) - Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - Towards Generalizable Multi-Camera 3D Object Detection via Perspective
Debiasing [28.874014617259935]
Multi-Camera 3D Object Detection (MC3D-Det) has gained prominence with the advent of bird's-eye view (BEV) approaches.
We propose a novel method that aligns 3D detection with 2D camera plane results, ensuring consistent and accurate detections.
arXiv Detail & Related papers (2023-10-17T15:31:28Z) - Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection.
We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment.
Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z) - Unsupervised Domain Adaptation for Monocular 3D Object Detection via
Self-Training [57.25828870799331]
We propose STMono3D, a new self-teaching framework for unsupervised domain adaptation on Mono3D.
We develop a teacher-student paradigm to generate adaptive pseudo labels on the target domain.
STMono3D achieves remarkable performance on all evaluated datasets and even surpasses fully supervised results on the KITTI 3D object detection dataset.
arXiv Detail & Related papers (2022-04-25T12:23:07Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.