Semi-SD: Semi-Supervised Metric Depth Estimation via Surrounding Cameras for Autonomous Driving
- URL: http://arxiv.org/abs/2503.19713v1
- Date: Tue, 25 Mar 2025 14:39:04 GMT
- Title: Semi-SD: Semi-Supervised Metric Depth Estimation via Surrounding Cameras for Autonomous Driving
- Authors: Yusen Xie, Zhengmin Huang, Shaojie Shen, Jun Ma,
- Abstract summary: Semi-SD is a novel metric depth estimation framework tailored for surrounding cameras equipment in autonomous driving.<n>We propose a unified spatial-temporal-semantic fusion module to construct the visual fused features.<n>We evaluate our algorithm on DDAD and nuScenes datasets, and the results demonstrate that our method achieves state-of-the-art performance.
- Score: 20.19617659712535
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we introduce Semi-SD, a novel metric depth estimation framework tailored for surrounding cameras equipment in autonomous driving. In this work, the input data consists of adjacent surrounding frames and camera parameters. We propose a unified spatial-temporal-semantic fusion module to construct the visual fused features. Cross-attention components for surrounding cameras and adjacent frames are utilized to focus on metric scale information refinement and temporal feature matching. Building on this, we propose a pose estimation framework using surrounding cameras, their corresponding estimated depths, and extrinsic parameters, which effectively address the scale ambiguity in multi-camera setups. Moreover, semantic world model and monocular depth estimation world model are integrated to supervised the depth estimation, which improve the quality of depth estimation. We evaluate our algorithm on DDAD and nuScenes datasets, and the results demonstrate that our method achieves state-of-the-art performance in terms of surrounding camera based depth estimation quality. The source code will be available on https://github.com/xieyuser/Semi-SD.
Related papers
- GVDepth: Zero-Shot Monocular Depth Estimation for Ground Vehicles based on Probabilistic Cue Fusion [7.588468985212172]
Generalizing metric monocular depth estimation presents a significant challenge due to its ill-posed nature.<n>We propose a novel canonical representation that maintains consistency across varied camera setups.<n>We also propose a novel architecture that adaptively and probabilistically fuses depths estimated via object size and vertical image position cues.
arXiv Detail & Related papers (2024-12-08T22:04:34Z) - KRONC: Keypoint-based Robust Camera Optimization for 3D Car Reconstruction [58.04846444985808]
This paper introduces KRONC, a novel approach aimed at inferring view poses by leveraging prior knowledge about the object to reconstruct and its representation through semantic keypoints.
With a focus on vehicle scenes, KRONC is able to estimate the position of the views as a solution to a light optimization problem targeting the convergence of keypoints' back-projections to a singular point.
arXiv Detail & Related papers (2024-09-09T08:08:05Z) - ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth.
Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module.
Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z) - GenDepth: Generalizing Monocular Depth Estimation for Arbitrary Camera
Parameters via Ground Plane Embedding [8.289857214449372]
GenDepth is a novel model capable of performing metric depth estimation for arbitrary vehicle-camera setups.
We propose a novel embedding of camera parameters as the ground plane depth and present a novel architecture that integrates these embeddings with adversarial domain alignment.
We validate GenDepth on several autonomous driving datasets, demonstrating its state-of-the-art generalization capability for different vehicle-camera systems.
arXiv Detail & Related papers (2023-12-10T22:28:34Z) - Multi-Camera Collaborative Depth Prediction via Consistent Structure
Estimation [75.99435808648784]
We propose a novel multi-camera collaborative depth prediction method.
It does not require large overlapping areas while maintaining structure consistency between cameras.
Experimental results on DDAD and NuScenes datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2022-10-05T03:44:34Z) - Uncertainty Guided Depth Fusion for Spike Camera [49.41822923588663]
We propose a novel Uncertainty-Guided Depth Fusion (UGDF) framework to fuse predictions of monocular and stereo depth estimation networks for spike camera.
Our framework is motivated by the fact that stereo spike depth estimation achieves better results at close range.
In order to demonstrate the advantage of spike depth estimation over traditional camera depth estimation, we contribute a spike-depth dataset named CitySpike20K.
arXiv Detail & Related papers (2022-08-26T13:04:01Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Multi-Camera Sensor Fusion for Visual Odometry using Deep Uncertainty
Estimation [34.8860186009308]
We propose a deep sensor fusion framework which estimates vehicle motion using both pose and uncertainty estimations from multiple on-board cameras.
We evaluate our approach on the publicly available, large scale autonomous vehicle dataset, nuScenes.
arXiv Detail & Related papers (2021-12-23T19:44:45Z) - Video Depth Estimation by Fusing Flow-to-Depth Proposals [65.24533384679657]
We present an approach with a differentiable flow-to-depth layer for video depth estimation.
The model consists of a flow-to-depth layer, a camera pose refinement module, and a depth fusion network.
Our approach outperforms state-of-the-art depth estimation methods, and has reasonable cross dataset generalization capability.
arXiv Detail & Related papers (2019-12-30T10:45:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.