HTC-DC Net: Monocular Height Estimation from Single Remote Sensing
Images
- URL: http://arxiv.org/abs/2309.16486v1
- Date: Thu, 28 Sep 2023 14:50:32 GMT
- Title: HTC-DC Net: Monocular Height Estimation from Single Remote Sensing
Images
- Authors: Sining Chen, Yilei Shi, Zhitong Xiong, Xiao Xiang Zhu
- Abstract summary: We propose a method for monocular height estimation from optical imagery.
As an ill-posed problem, monocular height estimation requires well-designed networks for enhanced representations.
We propose HTC-DC Net following the classification-regression paradigm, with the head-tail cut (HTC) and the distribution-based constraints (DCs) as the main contributions.
- Score: 24.65766848068617
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D geo-information is of great significance for understanding the living
environment; however, 3D perception from remote sensing data, especially on a
large scale, is restricted. To tackle this problem, we propose a method for
monocular height estimation from optical imagery, which is currently one of the
richest sources of remote sensing data. As an ill-posed problem, monocular
height estimation requires well-designed networks for enhanced representations
to improve performance. Moreover, the distribution of height values is
long-tailed with the low-height pixels, e.g., the background, as the head, and
thus trained networks are usually biased and tend to underestimate building
heights. To solve the problems, instead of formalizing the problem as a
regression task, we propose HTC-DC Net following the classification-regression
paradigm, with the head-tail cut (HTC) and the distribution-based constraints
(DCs) as the main contributions. HTC-DC Net is composed of the backbone network
as the feature extractor, the HTC-AdaBins module, and the hybrid regression
process. The HTC-AdaBins module serves as the classification phase to determine
bins adaptive to each input image. It is equipped with a vision transformer
encoder to incorporate local context with holistic information and involves an
HTC to address the long-tailed problem in monocular height estimation for
balancing the performances of foreground and background pixels. The hybrid
regression process does the regression via the smoothing of bins from the
classification phase, which is trained via DCs. The proposed network is tested
on three datasets of different resolutions, namely ISPRS Vaihingen (0.09 m),
DFC19 (1.3 m) and GBH (3 m). Experimental results show the superiority of the
proposed network over existing methods by large margins. Extensive ablation
studies demonstrate the effectiveness of each design component.
Related papers
- PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network [24.54269823691119]
We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives.
To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD.
All the images are finely annotated in pixel-level, far exceeding previous low-resolution SOD datasets.
arXiv Detail & Related papers (2024-08-02T09:31:21Z) - NeRF-Det++: Incorporating Semantic Cues and Perspective-aware Depth
Supervision for Indoor Multi-View 3D Detection [72.0098999512727]
NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by utilizing NeRF to enhance representation learning.
We present three corresponding solutions, including semantic enhancement, perspective-aware sampling, and ordinal depth supervision.
The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and AR KITScenes datasets.
arXiv Detail & Related papers (2024-02-22T11:48:06Z) - HeightFormer: A Multilevel Interaction and Image-adaptive
Classification-regression Network for Monocular Height Estimation with Aerial
Images [10.716933766055755]
This paper presents a comprehensive solution for monocular height estimation in remote sensing.
It features the Multilevel Interaction Backbone (MIB) and Image-adaptive Classification-regression Height Generator (ICG)
The ICG dynamically generates height partition for each image and reframes the traditional regression task.
arXiv Detail & Related papers (2023-10-12T02:49:00Z) - V2X-AHD:Vehicle-to-Everything Cooperation Perception via Asymmetric
Heterogenous Distillation Network [13.248981195106069]
We propose a multi-view vehicle-road cooperation perception system, vehicle-to-everything cooperative perception (V2X-AHD)
The V2X-AHD can effectively improve the accuracy of 3D object detection and reduce the number of network parameters, according to this study.
arXiv Detail & Related papers (2023-10-10T13:12:03Z) - Instant Multi-View Head Capture through Learnable Registration [62.70443641907766]
Existing methods for capturing datasets of 3D heads in dense semantic correspondence are slow.
We introduce TEMPEH to directly infer 3D heads in dense correspondence from calibrated multi-view images.
Predicting one head takes about 0.3 seconds with a median reconstruction error of 0.26 mm, 64% lower than the current state-of-the-art.
arXiv Detail & Related papers (2023-06-12T21:45:18Z) - Towards Model Generalization for Monocular 3D Object Detection [57.25828870799331]
We present an effective unified camera-generalized paradigm (CGP) for Mono3D object detection.
We also propose the 2D-3D geometry-consistent object scaling strategy (GCOS) to bridge the gap via an instance-level augment.
Our method called DGMono3D achieves remarkable performance on all evaluated datasets and surpasses the SoTA unsupervised domain adaptation scheme.
arXiv Detail & Related papers (2022-05-23T23:05:07Z) - Pyramid Grafting Network for One-Stage High Resolution Saliency
Detection [29.013012579688347]
We propose a one-stage framework called Pyramid Grafting Network (PGNet) to extract features from different resolution images independently.
An attention-based Cross-Model Grafting Module (CMGM) is proposed to enable CNN branch to combine broken detailed information more holistically.
We contribute a new Ultra-High-Resolution Saliency Detection dataset UHRSD, containing 5,920 images at 4K-8K resolutions.
arXiv Detail & Related papers (2022-04-11T12:22:21Z) - Progressive Coordinate Transforms for Monocular 3D Object Detection [52.00071336733109]
We propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
In this paper, we propose a novel and lightweight approach, dubbed em Progressive Coordinate Transforms (PCT) to facilitate learning coordinate representations.
arXiv Detail & Related papers (2021-08-12T15:22:33Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Height estimation from single aerial images using a deep ordinal
regression network [12.991266182762597]
We deal with the ambiguous and unsolved problem of height estimation from a single aerial image.
Driven by the success of deep learning, especially deep convolution neural networks (CNNs), some researches have proposed to estimate height information from a single aerial image.
In this paper, we proposed to divide height values into spacing-increasing intervals and transform the regression problem into an ordinal regression problem.
arXiv Detail & Related papers (2020-06-04T12:03:51Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.