Depth Field Networks for Generalizable Multi-view Scene Representation
- URL: http://arxiv.org/abs/2207.14287v1
- Date: Thu, 28 Jul 2022 17:59:31 GMT
- Title: Depth Field Networks for Generalizable Multi-view Scene Representation
- Authors: Vitor Guizilini, Igor Vasiljevic, Jiading Fang, Rares Ambrus, Greg
Shakhnarovich, Matthew Walter, Adrien Gaidon
- Abstract summary: We learn an implicit, multi-view consistent scene representation, introducing a series of 3D data augmentation techniques as a geometric inductive prior to increase view diversity.
Our Depth Field Networks (DeFiNe) achieve state-of-the-art results in stereo and video depth estimation without explicit geometric constraints, and improve on zero-shot domain generalization by a wide margin.
- Score: 31.090289865520475
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern 3D computer vision leverages learning to boost geometric reasoning,
mapping image data to classical structures such as cost volumes or epipolar
constraints to improve matching. These architectures are specialized according
to the particular problem, and thus require significant task-specific tuning,
often leading to poor domain generalization performance. Recently, generalist
Transformer architectures have achieved impressive results in tasks such as
optical flow and depth estimation by encoding geometric priors as inputs rather
than as enforced constraints. In this paper, we extend this idea and propose to
learn an implicit, multi-view consistent scene representation, introducing a
series of 3D data augmentation techniques as a geometric inductive prior to
increase view diversity. We also show that introducing view synthesis as an
auxiliary task further improves depth estimation. Our Depth Field Networks
(DeFiNe) achieve state-of-the-art results in stereo and video depth estimation
without explicit geometric constraints, and improve on zero-shot domain
generalization by a wide margin.
Related papers
- Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - DoubleTake: Geometry Guided Depth Estimation [17.464549832122714]
Estimating depth from a sequence of posed RGB images is a fundamental computer vision task.
We introduce a reconstruction which combines volume features with a hint of the prior geometry, rendered as a depth map from the current camera location.
We demonstrate that our method can run at interactive speeds, state-of-the-art estimates of depth and 3D scene in both offline and incremental evaluation scenarios.
arXiv Detail & Related papers (2024-06-26T14:29:05Z) - Multiple Prior Representation Learning for Self-Supervised Monocular Depth Estimation via Hybrid Transformer [12.486504395099022]
Self-supervised monocular depth estimation aims to infer depth information without relying on labeled data.
Lack of labeled information poses a significant challenge to the model's representation, limiting its ability to capture the intricate details of the scene accurately.
We introduce a novel self-supervised monocular depth estimation model that leverages multiple priors to bolster representation capabilities across spatial, context, and semantic dimensions.
arXiv Detail & Related papers (2024-06-13T08:51:57Z) - GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.
Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - GTA: A Geometry-Aware Attention Mechanism for Multi-View Transformers [63.41460219156508]
We argue that existing positional encoding schemes are suboptimal for 3D vision tasks.
We propose a geometry-aware attention mechanism that encodes the geometric structure of tokens as relative transformation.
We show that our attention, called Geometric Transform Attention (GTA), improves learning efficiency and performance of state-of-the-art transformer-based NVS models.
arXiv Detail & Related papers (2023-10-16T13:16:09Z) - AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation [51.143540967290114]
We propose a method that unlocks a wide range of previously-infeasible geometric augmentations for unsupervised depth computation and estimation.
This is achieved by reversing, or undo''-ing, geometric transformations to the coordinates of the output depth, warping the depth map back to the original reference frame.
arXiv Detail & Related papers (2023-10-15T05:15:45Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - DeLiRa: Self-Supervised Depth, Light, and Radiance Fields [32.350984950639656]
Differentiable volumetric rendering is a powerful paradigm for 3D reconstruction and novel view synthesis.
Standard volume rendering approaches struggle with degenerate geometries in the case of limited viewpoint diversity.
In this work, we propose to use the multi-view photometric objective as a geometric regularizer for volumetric rendering.
arXiv Detail & Related papers (2023-04-06T00:16:25Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [36.414471128890284]
We tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning.
Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples.
We propose a novel system that explicitly disentangles scale from the network estimation.
arXiv Detail & Related papers (2020-04-03T00:28:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.