Learning Continuous Depth Representation via Geometric Spatial
Aggregator
- URL: http://arxiv.org/abs/2212.03499v1
- Date: Wed, 7 Dec 2022 07:48:23 GMT
- Title: Learning Continuous Depth Representation via Geometric Spatial
Aggregator
- Authors: Xiaohang Wang, Xuanhong Chen, Bingbing Ni, Zhengyan Tong, Hang Wang
- Abstract summary: We propose a novel continuous depth representation for depth map super-resolution (DSR)
The heart of this representation is our proposed Geometric Spatial Aggregator (GSA), which exploits a distance field modulated by arbitrarily upsampled target gridding.
We also present a transformer-style backbone named GeoDSR, which possesses a principled way to construct the functional mapping between local coordinates.
- Score: 47.1698365486215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth map super-resolution (DSR) has been a fundamental task for 3D computer
vision. While arbitrary scale DSR is a more realistic setting in this scenario,
previous approaches predominantly suffer from the issue of inefficient
real-numbered scale upsampling. To explicitly address this issue, we propose a
novel continuous depth representation for DSR. The heart of this representation
is our proposed Geometric Spatial Aggregator (GSA), which exploits a distance
field modulated by arbitrarily upsampled target gridding, through which the
geometric information is explicitly introduced into feature aggregation and
target generation. Furthermore, bricking with GSA, we present a
transformer-style backbone named GeoDSR, which possesses a principled way to
construct the functional mapping between local coordinates and the
high-resolution output results, empowering our model with the advantage of
arbitrary shape transformation ready to help diverse zooming demand. Extensive
experimental results on standard depth map benchmarks, e.g., NYU v2, have
demonstrated that the proposed framework achieves significant restoration gain
in arbitrary scale depth map super-resolution compared with the prior art. Our
codes are available at https://github.com/nana01219/GeoDSR.
Related papers
- GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory.
Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images.
GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z) - Hi-Map: Hierarchical Factorized Radiance Field for High-Fidelity
Monocular Dense Mapping [51.739466714312805]
We introduce Hi-Map, a novel monocular dense mapping approach based on Neural Radiance Field (NeRF)
Hi-Map is exceptional in its capacity to achieve efficient and high-fidelity mapping using only posed RGB inputs.
arXiv Detail & Related papers (2024-01-06T12:32:25Z) - DSR-Diff: Depth Map Super-Resolution with Diffusion Model [38.68563026759223]
We present a novel CDSR paradigm that utilizes a diffusion model within the latent space to generate guidance for depth map super-resolution.
Our proposed method has shown superior performance in extensive experiments when compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-11-16T14:18:10Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - BridgeNet: A Joint Learning Network of Depth Map Super-Resolution and
Monocular Depth Estimation [60.34562823470874]
We propose a joint learning network of depth map super-resolution (DSR) and monocular depth estimation (MDE) without introducing additional supervision labels.
One is the high-frequency attention bridge (HABdg) designed for the feature encoding process, which learns the high-frequency information of the MDE task to guide the DSR task.
The other is the content guidance bridge (CGBdg) designed for the depth map reconstruction process, which provides the content guidance learned from DSR task for MDE task.
arXiv Detail & Related papers (2021-07-27T01:28:23Z) - Discrete Cosine Transform Network for Guided Depth Map Super-Resolution [19.86463937632802]
The goal is to use high-resolution (HR) RGB images to provide extra information on edges and object contours, so that low-resolution depth maps can be upsampled to HR ones.
We propose an advanced Discrete Cosine Transform Network (DCTNet), which is composed of four components.
We show that our method can generate accurate and HR depth maps, surpassing state-of-the-art methods.
arXiv Detail & Related papers (2021-04-14T17:01:03Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z) - Exploring intermediate representation for monocular vehicle pose
estimation [38.85309013717312]
We present a new learning-based framework to recover vehicle pose in SO(3) from a single RGB image.
In contrast to previous works that map from local appearance to observation angles, we explore a progressive approach by extracting meaningful Intermediate Geometrical Representations (IGRs)
This approach features a deep model that transforms perceived intensities to IGRs, which are mapped to a 3D representation encoding object orientation in the camera coordinate system.
arXiv Detail & Related papers (2020-11-17T06:30:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.