End-to-End Multi-View Structure-from-Motion with Hypercorrelation
Volumes
- URL: http://arxiv.org/abs/2209.06926v1
- Date: Wed, 14 Sep 2022 20:58:44 GMT
- Title: End-to-End Multi-View Structure-from-Motion with Hypercorrelation
Volumes
- Authors: Qiao Chen, Charalambos Poullis
- Abstract summary: Deep learning techniques have been proposed to tackle this problem.
We improve on the state-of-the-art two-view structure-from-motion(SfM) approach.
We extend it to the general multi-view case and evaluate it on the complex benchmark dataset DTU.
- Score: 7.99536002595393
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Image-based 3D reconstruction is one of the most important tasks in Computer
Vision with many solutions proposed over the last few decades. The objective is
to extract metric information i.e. the geometry of scene objects directly from
images. These can then be used in a wide range of applications such as film,
games, virtual reality, etc. Recently, deep learning techniques have been
proposed to tackle this problem. They rely on training on vast amounts of data
to learn to associate features between images through deep convolutional neural
networks and have been shown to outperform traditional procedural techniques.
In this paper, we improve on the state-of-the-art two-view
structure-from-motion(SfM) approach of [11] by incorporating 4D correlation
volume for more accurate feature matching and reconstruction. Furthermore, we
extend it to the general multi-view case and evaluate it on the complex
benchmark dataset DTU [4]. Quantitative evaluations and comparisons with
state-of-the-art multi-view 3D reconstruction methods demonstrate its
superiority in terms of the accuracy of reconstructions.
Related papers
- Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - Learning-based Multi-View Stereo: A Survey [55.3096230732874]
Multi-View Stereo (MVS) algorithms synthesize a comprehensive 3D representation, enabling precise reconstruction in complex environments.
With the success of deep learning, many learning-based MVS methods have been proposed, achieving impressive performance against traditional methods.
arXiv Detail & Related papers (2024-08-27T17:53:18Z) - MaRINeR: Enhancing Novel Views by Matching Rendered Images with Nearby References [49.71130133080821]
MaRINeR is a refinement method that leverages information of a nearby mapping image to improve the rendering of a target viewpoint.
We show improved renderings in quantitative metrics and qualitative examples from both explicit and implicit scene representations.
arXiv Detail & Related papers (2024-07-18T17:50:03Z) - MVSBoost: An Efficient Point Cloud-based 3D Reconstruction [4.282795945742752]
Efficient and accurate 3D reconstruction is crucial for various applications, including augmented and virtual reality, medical imaging, and cinematic special effects.
Traditional Multi-View Stereo (MVS) systems have been fundamental in these applications, but implicit 3D scene modeling has introduced new possibilities for handling complex topologies and continuous surfaces.
arXiv Detail & Related papers (2024-06-19T13:02:17Z) - Implicit Shape and Appearance Priors for Few-Shot Full Head
Reconstruction [17.254539604491303]
In this paper, we address the problem of few-shot full 3D head reconstruction.
We accomplish this by incorporating a probabilistic shape and appearance prior into coordinate-based representations.
We extend the H3DS dataset, which now comprises 60 high-resolution 3D full head scans and their corresponding posed images and masks.
arXiv Detail & Related papers (2023-10-12T07:35:30Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - Multi-view 3D Reconstruction with Transformer [34.756336770583154]
We reformulate the multi-view 3D reconstruction as a sequence-to-sequence prediction problem.
We propose a new framework named 3D Volume Transformer (VolT) for such a task.
Our method achieves a new state-of-the-art accuracy in multi-view reconstruction with fewer parameters.
arXiv Detail & Related papers (2021-03-24T03:14:49Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.