Improving Monocular Depth Estimation by Leveraging Structural Awareness
and Complementary Datasets
- URL: http://arxiv.org/abs/2007.11256v1
- Date: Wed, 22 Jul 2020 08:21:02 GMT
- Title: Improving Monocular Depth Estimation by Leveraging Structural Awareness
and Complementary Datasets
- Authors: Tian Chen, Shijie An, Yuan Zhang, Chongyang Ma, Huayan Wang, Xiaoyan
Guo, and Wen Zheng
- Abstract summary: We propose a structure-aware neural network with spatial attention blocks to exploit the spatial relationship of visual features.
Second, we introduce a global focal relative loss for uniform point pairs to enhance spatial constraint in the prediction.
Third, based on analysis of failure cases for prior methods, we collect a new Hard Case (HC) Depth dataset of challenging scenes.
- Score: 21.703238902823937
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Monocular depth estimation plays a crucial role in 3D recognition and
understanding. One key limitation of existing approaches lies in their lack of
structural information exploitation, which leads to inaccurate spatial layout,
discontinuous surface, and ambiguous boundaries. In this paper, we tackle this
problem in three aspects. First, to exploit the spatial relationship of visual
features, we propose a structure-aware neural network with spatial attention
blocks. These blocks guide the network attention to global structures or local
details across different feature layers. Second, we introduce a global focal
relative loss for uniform point pairs to enhance spatial constraint in the
prediction, and explicitly increase the penalty on errors in depth-wise
discontinuous regions, which helps preserve the sharpness of estimation
results. Finally, based on analysis of failure cases for prior methods, we
collect a new Hard Case (HC) Depth dataset of challenging scenes, such as
special lighting conditions, dynamic objects, and tilted camera angles. The new
dataset is leveraged by an informed learning curriculum that mixes training
examples incrementally to handle diverse data distributions. Experimental
results show that our method outperforms state-of-the-art approaches by a large
margin in terms of both prediction accuracy on NYUDv2 dataset and
generalization performance on unseen datasets.
Related papers
- Point Cloud Understanding via Attention-Driven Contrastive Learning [64.65145700121442]
Transformer-based models have advanced point cloud understanding by leveraging self-attention mechanisms.
PointACL is an attention-driven contrastive learning framework designed to address these limitations.
Our method employs an attention-driven dynamic masking strategy that guides the model to focus on under-attended regions.
arXiv Detail & Related papers (2024-11-22T05:41:00Z) - Neural Surface Reconstruction from Sparse Views Using Epipolar Geometry [4.659427498118277]
We present a novel approach, named EpiS, that incorporates Epipolar information into the reconstruction process.
Our method aggregates coarse information from the cost volume into Epipolar features extracted from multiple source views.
To address the information gaps in sparse conditions, we integrate depth information from monocular depth estimation using global and local regularization techniques.
arXiv Detail & Related papers (2024-06-06T17:47:48Z) - DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation [17.99904937160487]
DCPI-Depth is a framework that incorporates all these innovative components and couples two bidirectional and collaborative streams.
It achieves state-of-the-art performance and generalizability across multiple public datasets, outperforming all existing prior arts.
arXiv Detail & Related papers (2024-05-27T08:55:17Z) - 2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic
Segmentation [92.17700318483745]
We propose an image-guidance network (IGNet) which builds upon the idea of distilling high level feature information from a domain adapted synthetically trained 2D semantic segmentation network.
IGNet achieves state-of-the-art results for weakly-supervised LiDAR semantic segmentation on ScribbleKITTI, boasting up to 98% relative performance to fully supervised training with only 8% labeled points.
arXiv Detail & Related papers (2023-11-27T07:57:29Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - X-PDNet: Accurate Joint Plane Instance Segmentation and Monocular Depth
Estimation with Cross-Task Distillation and Boundary Correction [9.215384107659665]
X-PDNet is a framework for the multitask learning of plane instance segmentation and depth estimation.
We highlight the current limitations of using the ground truth boundary to develop boundary regression loss.
We propose a novel method that exploits depth information to support precise boundary region segmentation.
arXiv Detail & Related papers (2023-09-15T14:27:54Z) - Semi-Supervised Building Footprint Generation with Feature and Output
Consistency Training [17.6179873429447]
State-of-the-art semi-supervised semantic segmentation networks with consistency training can help to deal with this issue.
We propose to integrate the consistency of both features and outputs in the end-to-end network training of unlabeled samples.
Experimental results show that the proposed approach can well extract more complete building structures.
arXiv Detail & Related papers (2022-05-17T14:55:13Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - Occlusion-aware Unsupervised Learning of Depth from 4-D Light Fields [50.435129905215284]
We present an unsupervised learning-based depth estimation method for 4-D light field processing and analysis.
Based on the basic knowledge of the unique geometry structure of light field data, we explore the angular coherence among subsets of the light field views to estimate depth maps.
Our method can significantly shrink the performance gap between the previous unsupervised method and supervised ones, and produce depth maps with comparable accuracy to traditional methods with obviously reduced computational cost.
arXiv Detail & Related papers (2021-06-06T06:19:50Z) - Seismic horizon detection with neural networks [62.997667081978825]
This paper is an open-sourced research of applying binary segmentation approach to the task of horizon detection on multiple real seismic cubes with a focus on inter-cube generalization of the predictive model.
The main contribution of this paper is an open-sourced research of applying binary segmentation approach to the task of horizon detection on multiple real seismic cubes with a focus on inter-cube generalization of the predictive model.
arXiv Detail & Related papers (2020-01-10T11:30:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.