EGA-Depth: Efficient Guided Attention for Self-Supervised Multi-Camera
Depth Estimation
- URL: http://arxiv.org/abs/2304.03369v1
- Date: Thu, 6 Apr 2023 20:50:28 GMT
- Title: EGA-Depth: Efficient Guided Attention for Self-Supervised Multi-Camera
Depth Estimation
- Authors: Yunxiao Shi, Hong Cai, Amin Ansari, Fatih Porikli
- Abstract summary: We propose a novel guided attention architecture, EGA-Depth, which can improve the efficiency and accuracy of self-supervised multi-camera depth estimation.
For each camera, we use its perspective view as the query to cross-reference its neighboring views to derive informative features for this camera view.
- Score: 45.59727643007449
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ubiquitous multi-camera setup on modern autonomous vehicles provides an
opportunity to construct surround-view depth. Existing methods, however, either
perform independent monocular depth estimations on each camera or rely on
computationally heavy self attention mechanisms. In this paper, we propose a
novel guided attention architecture, EGA-Depth, which can improve both the
efficiency and accuracy of self-supervised multi-camera depth estimation. More
specifically, for each camera, we use its perspective view as the query to
cross-reference its neighboring views to derive informative features for this
camera view. This allows the model to perform attention only across views with
considerable overlaps and avoid the costly computations of standard
self-attention. Given its efficiency, EGA-Depth enables us to exploit
higher-resolution visual features, leading to improved accuracy. Furthermore,
EGA-Depth can incorporate more frames from previous time steps as it scales
linearly w.r.t. the number of views and frames. Extensive experiments on two
challenging autonomous driving benchmarks nuScenes and DDAD demonstrate the
efficacy of our proposed EGA-Depth and show that it achieves the new
state-of-the-art in self-supervised multi-camera depth estimation.
Related papers
- Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers [39.14931758754381]
We introduce a novel fusion method that bypasses monocular depth estimation altogether.
We show that our model can modulate its use of camera features based on the availability of lidar features.
arXiv Detail & Related papers (2023-12-22T18:51:50Z) - Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection [66.74183705987276]
We introduce a framework to improve the camera-only apprentice model, including an apprentice-friendly multi-modal expert and temporal-fusion-friendly distillation supervision.
With those improvements, our camera-only apprentice VCD-A sets new state-of-the-art on nuScenes with a score of 63.1% NDS.
arXiv Detail & Related papers (2023-10-24T09:29:26Z) - Robust Self-Supervised Extrinsic Self-Calibration [25.727912226753247]
Multi-camera self-supervised monocular depth estimation from videos is a promising way to reason about the environment.
We introduce a novel method for extrinsic calibration that builds upon the principles of self-supervised monocular depth and ego-motion learning.
arXiv Detail & Related papers (2023-08-04T06:20:20Z) - A Simple Baseline for Supervised Surround-view Depth Estimation [25.81521612343612]
We propose S3Depth, a Simple Baseline for Supervised Surround-view Depth Estimation.
We employ a global-to-local feature extraction module which combines CNN with transformer layers for enriched representations.
Our method achieves superior performance over existing state-of-the-art methods on both DDAD and nuScenes datasets.
arXiv Detail & Related papers (2023-03-14T10:06:19Z) - Multi-Camera Collaborative Depth Prediction via Consistent Structure
Estimation [75.99435808648784]
We propose a novel multi-camera collaborative depth prediction method.
It does not require large overlapping areas while maintaining structure consistency between cameras.
Experimental results on DDAD and NuScenes datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2022-10-05T03:44:34Z) - BiFuse++: Self-supervised and Efficient Bi-projection Fusion for 360
Depth Estimation [59.11106101006008]
We propose BiFuse++ to explore the combination of bi-projection fusion and the self-training scenario.
We propose a new fusion module and Contrast-Aware Photometric Loss to improve the performance of BiFuse.
arXiv Detail & Related papers (2022-09-07T06:24:21Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Full Surround Monodepth from Multiple Cameras [31.145598985137468]
We extend self-supervised monocular depth and ego-motion estimation to large photo-baseline multi-camera rigs.
We learn a single network generating dense, consistent, and scale-aware point clouds that cover the same full surround 360 degree field of view as a typical LiDAR scanner.
arXiv Detail & Related papers (2021-03-31T22:52:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.