Related papers: Height estimation from single aerial images using a deep ordinal regression network

Height estimation from single aerial images using a deep ordinal regression network

URL: http://arxiv.org/abs/2006.02801v1
Date: Thu, 4 Jun 2020 12:03:51 GMT
Title: Height estimation from single aerial images using a deep ordinal regression network
Authors: Xiang Li, Mingyang Wang, Yi Fang
Abstract summary: We deal with the ambiguous and unsolved problem of height estimation from a single aerial image. Driven by the success of deep learning, especially deep convolution neural networks (CNNs), some researches have proposed to estimate height information from a single aerial image. In this paper, we proposed to divide height values into spacing-increasing intervals and transform the regression problem into an ordinal regression problem.
Score: 12.991266182762597
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding the 3D geometric structure of the Earth's surface has been an active research topic in photogrammetry and remote sensing community for decades, serving as an essential building block for various applications such as 3D digital city modeling, change detection, and city management. Previous researches have extensively studied the problem of height estimation from aerial images based on stereo or multi-view image matching. These methods require two or more images from different perspectives to reconstruct 3D coordinates with camera information provided. In this paper, we deal with the ambiguous and unsolved problem of height estimation from a single aerial image. Driven by the great success of deep learning, especially deep convolution neural networks (CNNs), some researches have proposed to estimate height information from a single aerial image by training a deep CNN model with large-scale annotated datasets. These methods treat height estimation as a regression problem and directly use an encoder-decoder network to regress the height values. In this paper, we proposed to divide height values into spacing-increasing intervals and transform the regression problem into an ordinal regression problem, using an ordinal loss for network training. To enable multi-scale feature extraction, we further incorporate an Atrous Spatial Pyramid Pooling (ASPP) module to extract features from multiple dilated convolution layers. After that, a post-processing technique is designed to transform the predicted height map of each patch into a seamless height map. Finally, we conduct extensive experiments on ISPRS Vaihingen and Potsdam datasets. Experimental results demonstrate significantly better performance of our method compared to the state-of-the-art methods.

Related papers

Pseudo Depth Meets Gaussian: A Feed-forward RGB SLAM Baseline [64.42938561167402]
We propose an online 3D reconstruction method using 3D Gaussian-based SLAM, combined with a feed-forward recurrent prediction module.<n>This approach replaces slow test-time optimization with fast network inference, significantly improving tracking speed.<n>Our method achieves performance on par with the state-of-the-art SplaTAM, while reducing tracking time by more than 90%.
arXiv Detail & Related papers (2025-08-06T16:16:58Z)
Deep Supervised LSTM for 3D morphology estimation from Multi-View RGB Images of Wheat Spikes [0.0]
Estimating morphological traits from two-dimensional RGB images presents inherent challenges.<n>We propose a neural network approach for volume estimation in 2D images.<n>Our deep supervised model achieves a mean absolute percentage error (MAPE) of 6.46% on six-view indoor images.
arXiv Detail & Related papers (2025-06-22T15:02:18Z)
MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps [51.44887282336391]
Key challenge of multi-view indoor 3D object detection is to infer accurate geometry information from images for precise 3D detection. Previous method relies on NeRF for geometry reasoning. We propose MVSDet which utilizes plane sweep for geometry-aware 3D object detection.
arXiv Detail & Related papers (2024-10-28T21:58:41Z)
GeoLRM: Geometry-Aware Large Reconstruction Model for High-Quality 3D Gaussian Generation [65.33726478659304]
We introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in only 11 GB GPU memory. Previous works neglect the inherent sparsity of 3D structure and do not utilize explicit geometric relationships between 3D and 2D images. GeoLRM tackles these issues by incorporating a novel 3D-aware transformer structure that directly processes 3D points and uses deformable cross-attention mechanisms.
arXiv Detail & Related papers (2024-06-21T17:49:31Z)
AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation [51.143540967290114]
We propose a method that unlocks a wide range of previously-infeasible geometric augmentations for unsupervised depth computation and estimation. This is achieved by reversing, or undo''-ing, geometric transformations to the coordinates of the output depth, warping the depth map back to the original reference frame.
arXiv Detail & Related papers (2023-10-15T05:15:45Z)
HeightFormer: A Multilevel Interaction and Image-adaptive Classification-regression Network for Monocular Height Estimation with Aerial Images [10.716933766055755]
This paper presents a comprehensive solution for monocular height estimation in remote sensing. It features the Multilevel Interaction Backbone (MIB) and Image-adaptive Classification-regression Height Generator (ICG) The ICG dynamically generates height partition for each image and reframes the traditional regression task.
arXiv Detail & Related papers (2023-10-12T02:49:00Z)
Multi-tiling Neural Radiance Field (NeRF) -- Geometric Assessment on Large-scale Aerial Datasets [5.391764618878545]
In this paper, we aim to scale the Neural Radiance Fields (NeRF) on large-scael aerial datasets. Specifically, we introduce a location-specific sampling technique as well as a multi-camera tiling (MCT) strategy to reduce memory consumption. We implement our method on a representative approach, Mip-NeRF, and compare its geometry performance with threephotgrammetric MVS pipelines.
arXiv Detail & Related papers (2023-10-01T00:21:01Z)
GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion. In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning. Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z)
Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection. We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment. Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z)
GCNDepth: Self-supervised Monocular Depth Estimation based on Graph Convolutional Network [11.332580333969302]
This work brings a new solution with a set of improvements, which increase the quantitative and qualitative understanding of depth maps. A graph convolutional network (GCN) can handle the convolution on non-Euclidean data and it can be applied to irregular image regions within a topological structure. Our method provided comparable and promising results with a high prediction accuracy of 89% on the publicly KITTI and Make3D datasets.
arXiv Detail & Related papers (2021-12-13T16:46:25Z)
Large-scale Building Height Retrieval from Single SAR Imagery based on Bounding Box Regression Networks [21.788338971571736]
Building height retrieval from synthetic aperture radar (SAR) imagery is of great importance for urban applications. This paper addresses the issue of building height retrieval in large-scale urban areas from a single TerraSAR-X spotlight or stripmap image.
arXiv Detail & Related papers (2021-11-18T00:39:48Z)
Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video. Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer. To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z)
Attention Aware Cost Volume Pyramid Based Multi-view Stereo Network for 3D Reconstruction [12.728154351588053]
We present an efficient multi-view stereo (MVS) network for 3D reconstruction from multiview images. We introduce a coarseto-fine depth inference strategy to achieve high resolution depth.
arXiv Detail & Related papers (2020-11-25T13:34:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.