A Novel Recurrent Encoder-Decoder Structure for Large-Scale Multi-view
Stereo Reconstruction from An Open Aerial Dataset
- URL: http://arxiv.org/abs/2003.00637v3
- Date: Mon, 16 Mar 2020 04:27:33 GMT
- Title: A Novel Recurrent Encoder-Decoder Structure for Large-Scale Multi-view
Stereo Reconstruction from An Open Aerial Dataset
- Authors: Jin Liu and Shunping Ji
- Abstract summary: We present a synthetic aerial dataset, called the WHU dataset, which is the first large-scale multi-view aerial dataset.
We also introduce in this paper a novel network, called RED-Net, for wide-range depth inference.
Our experiments confirmed that not only did our method exceed the current state-of-the-art MVS methods by more than 50% mean absolute error (MAE) with less memory and computational cost, but its efficiency as well.
- Score: 6.319667056655425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A great deal of research has demonstrated recently that multi-view stereo
(MVS) matching can be solved with deep learning methods. However, these efforts
were focused on close-range objects and only a very few of the deep
learning-based methods were specifically designed for large-scale 3D urban
reconstruction due to the lack of multi-view aerial image benchmarks. In this
paper, we present a synthetic aerial dataset, called the WHU dataset, we
created for MVS tasks, which, to our knowledge, is the first large-scale
multi-view aerial dataset. It was generated from a highly accurate 3D digital
surface model produced from thousands of real aerial images with precise camera
parameters. We also introduce in this paper a novel network, called RED-Net,
for wide-range depth inference, which we developed from a recurrent
encoder-decoder structure to regularize cost maps across depths and a 2D fully
convolutional network as framework. RED-Net's low memory requirements and high
performance make it suitable for large-scale and highly accurate 3D Earth
surface reconstruction. Our experiments confirmed that not only did our method
exceed the current state-of-the-art MVS methods by more than 50% mean absolute
error (MAE) with less memory and computational cost, but its efficiency as
well. It outperformed one of the best commercial software programs based on
conventional methods, improving their efficiency 16 times over. Moreover, we
proved that our RED-Net model pre-trained on the synthetic WHU dataset can be
efficiently transferred to very different multi-view aerial image datasets
without any fine-tuning. Dataset are available at http://gpcv.whu.edu.cn/data.
Related papers
- WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild [53.288327629960364]
We present a data-driven pipeline for efficient multi-hand reconstruction in the wild.
The proposed pipeline is composed of two components: a real-time fully convolutional hand localization and a high-fidelity transformer-based 3D hand reconstruction model.
Our approach outperforms previous methods in both efficiency and accuracy on popular 2D and 3D benchmarks.
arXiv Detail & Related papers (2024-09-18T18:46:51Z) - Sparse Multi-baseline SAR Cross-modal 3D Reconstruction of Vehicle Targets [5.6680936716261705]
We propose a Cross-Modal Reconstruction Network (CMR-Net), which integrates differentiable render and cross-modal supervision with optical images.
CMR-Net, trained solely on simulated data, demonstrates high-resolution reconstruction capabilities on both publicly available simulation datasets and real measured datasets.
arXiv Detail & Related papers (2024-06-06T15:18:59Z) - Efficient Physics-Based Learned Reconstruction Methods for Real-Time 3D
Near-Field MIMO Radar Imaging [0.0]
Near-field multiple-input multiple-output (MIMO) radar imaging systems have recently gained significant attention.
In this paper, we develop novel non-iterative deep learning-based reconstruction methods for real-time near-field imaging.
The goal is to achieve high image quality with low computational cost at settings.
arXiv Detail & Related papers (2023-12-28T11:05:36Z) - Multi-tiling Neural Radiance Field (NeRF) -- Geometric Assessment on Large-scale Aerial Datasets [5.391764618878545]
In this paper, we aim to scale the Neural Radiance Fields (NeRF) on large-scael aerial datasets.
Specifically, we introduce a location-specific sampling technique as well as a multi-camera tiling (MCT) strategy to reduce memory consumption.
We implement our method on a representative approach, Mip-NeRF, and compare its geometry performance with threephotgrammetric MVS pipelines.
arXiv Detail & Related papers (2023-10-01T00:21:01Z) - Neural Progressive Meshes [54.52990060976026]
We propose a method to transmit 3D meshes with a shared learned generative space.
We learn this space using a subdivision-based encoder-decoder architecture trained in advance on a large collection of surfaces.
We evaluate our method on a diverse set of complex 3D shapes and demonstrate that it outperforms baselines in terms of compression ratio and reconstruction quality.
arXiv Detail & Related papers (2023-08-10T17:58:02Z) - UAVStereo: A Multiple Resolution Dataset for Stereo Matching in UAV
Scenarios [0.6524460254566905]
This paper constructs a multi-resolution UAV scenario dataset, called UAVStereo, with over 34k stereo image pairs covering 3 typical scenes.
In this paper, we evaluate traditional and state-of-the-art deep learning methods, highlighting their limitations in addressing challenges in UAV scenarios.
arXiv Detail & Related papers (2023-02-20T16:45:27Z) - Simple and Effective Synthesis of Indoor 3D Scenes [78.95697556834536]
We study the problem of immersive 3D indoor scenes from one or more images.
Our aim is to generate high-resolution images and videos from novel viewpoints.
We propose an image-to-image GAN that maps directly from reprojections of incomplete point clouds to full high-resolution RGB-D images.
arXiv Detail & Related papers (2022-04-06T17:54:46Z) - 3DVNet: Multi-View Depth Prediction and Volumetric Refinement [68.68537312256144]
3DVNet is a novel multi-view stereo (MVS) depth-prediction method.
Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions.
We show that our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics.
arXiv Detail & Related papers (2021-12-01T00:52:42Z) - Ground material classification and for UAV-based photogrammetric 3D data
A 2D-3D Hybrid Approach [1.3359609092684614]
In recent years, photogrammetry has been widely used in many areas to create 3D virtual data representing the physical environment.
These cutting-edge technologies have caught the US Army and Navy's attention for the purpose of rapid 3D battlefield reconstruction, virtual training, and simulations.
arXiv Detail & Related papers (2021-09-24T22:29:26Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.