Depth-Enhanced Feature Pyramid Network for Occlusion-Aware Verification
of Buildings from Oblique Images
- URL: http://arxiv.org/abs/2011.13226v2
- Date: Sat, 23 Jan 2021 08:46:50 GMT
- Title: Depth-Enhanced Feature Pyramid Network for Occlusion-Aware Verification
of Buildings from Oblique Images
- Authors: Qing Zhu and Shengzhi Huang and Han Hu and Haifeng Li and Min Chen and
Ruofei Zhong
- Abstract summary: This paper proposes a fused feature pyramid network to detect changes in buildings in urban environments.
It uses both color and depth data for the 3D verification of existing buildings 2D footprints from oblique images.
We demonstrate that the proposed method can successfully detect all changed buildings.
- Score: 15.466320414614971
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Detecting the changes of buildings in urban environments is essential.
Existing methods that use only nadir images suffer from severe problems of
ambiguous features and occlusions between buildings and other regions.
Furthermore, buildings in urban environments vary significantly in scale, which
leads to performance issues when using single-scale features. To solve these
issues, this paper proposes a fused feature pyramid network, which utilizes
both color and depth data for the 3D verification of existing buildings 2D
footprints from oblique images. First, the color data of oblique images are
enriched with the depth information rendered from 3D mesh models. Second,
multiscale features are fused in the feature pyramid network to convolve both
the color and depth data. Finally, multi-view information from both the nadir
and oblique images is used in a robust voting procedure to label changes in
existing buildings. Experimental evaluations using both the ISPRS benchmark
datasets and Shenzhen datasets reveal that the proposed method outperforms the
ResNet and EfficientNet networks by 5\% and 2\%, respectively, in terms of
recall rate and precision. We demonstrate that the proposed method can
successfully detect all changed buildings; therefore, only those marked as
changed need to be manually checked during the pipeline updating procedure;
this significantly reduces the manual quality control requirements. Moreover,
ablation studies indicate that using depth data, feature pyramid modules, and
multi-view voting strategies can lead to clear and progressive improvements.
Related papers
- AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation [51.143540967290114]
We propose a method that unlocks a wide range of previously-infeasible geometric augmentations for unsupervised depth computation and estimation.
This is achieved by reversing, or undo''-ing, geometric transformations to the coordinates of the output depth, warping the depth map back to the original reference frame.
arXiv Detail & Related papers (2023-10-15T05:15:45Z) - Self-Supervised Monocular Depth Estimation by Direction-aware Cumulative
Convolution Network [80.19054069988559]
We find that self-supervised monocular depth estimation shows a direction sensitivity and environmental dependency.
We propose a new Direction-aware Cumulative Convolution Network (DaCCN), which improves the depth representation in two aspects.
Experiments show that our method achieves significant improvements on three widely used benchmarks.
arXiv Detail & Related papers (2023-08-10T14:32:18Z) - Pyramid Deep Fusion Network for Two-Hand Reconstruction from RGB-D Images [11.100398985633754]
We propose an end-to-end framework for recovering dense meshes for both hands.
Our framework employs ResNet50 and PointNet++ to derive features from RGB and point cloud.
We also introduce a novel pyramid deep fusion network (PDFNet) to aggregate features at different scales.
arXiv Detail & Related papers (2023-07-12T09:33:21Z) - BS3D: Building-scale 3D Reconstruction from RGB-D Images [25.604775584883413]
We propose an easy-to-use framework for acquiring building-scale 3D reconstruction using a consumer depth camera.
Unlike complex and expensive acquisition setups, our system enables crowd-sourcing, which can greatly benefit data-hungry algorithms.
arXiv Detail & Related papers (2023-01-03T11:46:14Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - Efficient and Accurate Hyperspectral Pansharpening Using 3D VolumeNet
and 2.5D Texture Transfer [13.854539265252201]
We propose a novel multi-spectral image fusion method using a combination of the previously proposed 3D CNN model VolumeNet and 2.5D texture transfer method.
The experimental results show that the proposed method outperforms the existing methods in terms of objective accuracy assessment, method efficiency, and visual subjective evaluation.
arXiv Detail & Related papers (2022-03-08T09:24:12Z) - Consistent Depth Prediction under Various Illuminations using Dilated
Cross Attention [1.332560004325655]
We propose to use internet 3D indoor scenes and manually tune their illuminations to render photo-realistic RGB photos and their corresponding depth and BRDF maps.
We perform cross attention on these dilated features to retain the consistency of depth prediction under different illuminations.
Our method is evaluated by comparing it with current state-of-the-art methods on Vari dataset and a significant improvement is observed in experiments.
arXiv Detail & Related papers (2021-12-15T10:02:46Z) - Total Scale: Face-to-Body Detail Reconstruction from Sparse RGBD Sensors [52.38220261632204]
Flat facial surfaces frequently occur in the PIFu-based reconstruction results.
We propose a two-scale PIFu representation to enhance the quality of the reconstructed facial details.
Experiments demonstrate the effectiveness of our approach in vivid facial details and deforming body shapes.
arXiv Detail & Related papers (2021-12-03T18:46:49Z) - City-scale Scene Change Detection using Point Clouds [71.73273007900717]
We propose a method for detecting structural changes in a city using images captured from mounted cameras over two different times.
A direct comparison of the two point clouds for change detection is not ideal due to inaccurate geo-location information.
To circumvent this problem, we propose a deep learning-based non-rigid registration on the point clouds.
Experiments show that our method is able to detect scene changes effectively, even in the presence of viewpoint and illumination differences.
arXiv Detail & Related papers (2021-03-26T08:04:13Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - Height estimation from single aerial images using a deep ordinal
regression network [12.991266182762597]
We deal with the ambiguous and unsolved problem of height estimation from a single aerial image.
Driven by the success of deep learning, especially deep convolution neural networks (CNNs), some researches have proposed to estimate height information from a single aerial image.
In this paper, we proposed to divide height values into spacing-increasing intervals and transform the regression problem into an ordinal regression problem.
arXiv Detail & Related papers (2020-06-04T12:03:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.