Related papers: Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets

URL: http://arxiv.org/abs/2007.11256v1
Date: Wed, 22 Jul 2020 08:21:02 GMT
Title: Improving Monocular Depth Estimation by Leveraging Structural Awareness and Complementary Datasets
Authors: Tian Chen, Shijie An, Yuan Zhang, Chongyang Ma, Huayan Wang, Xiaoyan Guo, and Wen Zheng
Abstract summary: We propose a structure-aware neural network with spatial attention blocks to exploit the spatial relationship of visual features. Second, we introduce a global focal relative loss for uniform point pairs to enhance spatial constraint in the prediction. Third, based on analysis of failure cases for prior methods, we collect a new Hard Case (HC) Depth dataset of challenging scenes.
Score: 21.703238902823937
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Monocular depth estimation plays a crucial role in 3D recognition and understanding. One key limitation of existing approaches lies in their lack of structural information exploitation, which leads to inaccurate spatial layout, discontinuous surface, and ambiguous boundaries. In this paper, we tackle this problem in three aspects. First, to exploit the spatial relationship of visual features, we propose a structure-aware neural network with spatial attention blocks. These blocks guide the network attention to global structures or local details across different feature layers. Second, we introduce a global focal relative loss for uniform point pairs to enhance spatial constraint in the prediction, and explicitly increase the penalty on errors in depth-wise discontinuous regions, which helps preserve the sharpness of estimation results. Finally, based on analysis of failure cases for prior methods, we collect a new Hard Case (HC) Depth dataset of challenging scenes, such as special lighting conditions, dynamic objects, and tilted camera angles. The new dataset is leveraged by an informed learning curriculum that mixes training examples incrementally to handle diverse data distributions. Experimental results show that our method outperforms state-of-the-art approaches by a large margin in terms of both prediction accuracy on NYUDv2 dataset and generalization performance on unseen datasets.

Related papers

Object Affordance Recognition and Grounding via Multi-scale Cross-modal Representation Learning [64.32618490065117]
A core problem of Embodied AI is to learn object manipulation from observation, as humans do.<n>We propose a novel approach that learns an affordance-aware 3D representation and employs a stage-wise inference strategy.<n> Experiments demonstrate the effectiveness of our method, showing improved performance in both affordance grounding and classification.
arXiv Detail & Related papers (2025-08-02T04:14:18Z)
Intern-GS: Vision Model Guided Sparse-View 3D Gaussian Splatting [95.61137026932062]
Intern-GS is a novel approach to enhance the process of sparse-view Gaussian splatting.<n>We show that Intern-GS achieves state-of-the-art rendering quality across diverse datasets.
arXiv Detail & Related papers (2025-05-27T05:17:49Z)
Point Cloud Understanding via Attention-Driven Contrastive Learning [64.65145700121442]
Transformer-based models have advanced point cloud understanding by leveraging self-attention mechanisms. PointACL is an attention-driven contrastive learning framework designed to address these limitations. Our method employs an attention-driven dynamic masking strategy that guides the model to focus on under-attended regions.
arXiv Detail & Related papers (2024-11-22T05:41:00Z)
Neural Surface Reconstruction from Sparse Views Using Epipolar Geometry [4.659427498118277]
We present a novel approach, named EpiS, that incorporates Epipolar information into the reconstruction process. Our method aggregates coarse information from the cost volume into Epipolar features extracted from multiple source views. To address the information gaps in sparse conditions, we integrate depth information from monocular depth estimation using global and local regularization techniques.
arXiv Detail & Related papers (2024-06-06T17:47:48Z)
DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation [17.99904937160487]
DCPI-Depth is a framework that incorporates all these innovative components and couples two bidirectional and collaborative streams. It achieves state-of-the-art performance and generalizability across multiple public datasets, outperforming all existing prior arts.
arXiv Detail & Related papers (2024-05-27T08:55:17Z)
2D Feature Distillation for Weakly- and Semi-Supervised 3D Semantic Segmentation [92.17700318483745]
We propose an image-guidance network (IGNet) which builds upon the idea of distilling high level feature information from a domain adapted synthetically trained 2D semantic segmentation network. IGNet achieves state-of-the-art results for weakly-supervised LiDAR semantic segmentation on ScribbleKITTI, boasting up to 98% relative performance to fully supervised training with only 8% labeled points.
arXiv Detail & Related papers (2023-11-27T07:57:29Z)
Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations. Comprehensive experiments underscore our framework's superior generalization capabilities. Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z)
X-PDNet: Accurate Joint Plane Instance Segmentation and Monocular Depth Estimation with Cross-Task Distillation and Boundary Correction [9.215384107659665]
X-PDNet is a framework for the multitask learning of plane instance segmentation and depth estimation. We highlight the current limitations of using the ground truth boundary to develop boundary regression loss. We propose a novel method that exploits depth information to support precise boundary region segmentation.
arXiv Detail & Related papers (2023-09-15T14:27:54Z)
Semi-Supervised Building Footprint Generation with Feature and Output Consistency Training [17.6179873429447]
State-of-the-art semi-supervised semantic segmentation networks with consistency training can help to deal with this issue. We propose to integrate the consistency of both features and outputs in the end-to-end network training of unlabeled samples. Experimental results show that the proposed approach can well extract more complete building structures.
arXiv Detail & Related papers (2022-05-17T14:55:13Z)
Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems. Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results. This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z)
Occlusion-aware Unsupervised Learning of Depth from 4-D Light Fields [50.435129905215284]
We present an unsupervised learning-based depth estimation method for 4-D light field processing and analysis. Based on the basic knowledge of the unique geometry structure of light field data, we explore the angular coherence among subsets of the light field views to estimate depth maps. Our method can significantly shrink the performance gap between the previous unsupervised method and supervised ones, and produce depth maps with comparable accuracy to traditional methods with obviously reduced computational cost.
arXiv Detail & Related papers (2021-06-06T06:19:50Z)
Seismic horizon detection with neural networks [62.997667081978825]
This paper is an open-sourced research of applying binary segmentation approach to the task of horizon detection on multiple real seismic cubes with a focus on inter-cube generalization of the predictive model. The main contribution of this paper is an open-sourced research of applying binary segmentation approach to the task of horizon detection on multiple real seismic cubes with a focus on inter-cube generalization of the predictive model.
arXiv Detail & Related papers (2020-01-10T11:30:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.