Structure-Centric Robust Monocular Depth Estimation via Knowledge Distillation
- URL: http://arxiv.org/abs/2410.06982v1
- Date: Wed, 9 Oct 2024 15:20:29 GMT
- Title: Structure-Centric Robust Monocular Depth Estimation via Knowledge Distillation
- Authors: Runze Chen, Haiyong Luo, Fang Zhao, Jingze Yu, Yupeng Jia, Juan Wang, Xuepeng Ma,
- Abstract summary: Monocular depth estimation is a key technique for 3D perception in computer vision.
It faces significant challenges in real-world scenarios, which encompass adverse weather variations, motion blur, as well as scenes with poor lighting conditions at night.
We devise a novel approach to reduce over-reliance on local textures, enhancing robustness against missing or interfering patterns.
- Score: 9.032563775151074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth estimation, enabled by self-supervised learning, is a key technique for 3D perception in computer vision. However, it faces significant challenges in real-world scenarios, which encompass adverse weather variations, motion blur, as well as scenes with poor lighting conditions at night. Our research reveals that we can divide monocular depth estimation into three sub-problems: depth structure consistency, local texture disambiguation, and semantic-structural correlation. Our approach tackles the non-robustness of existing self-supervised monocular depth estimation models to interference textures by adopting a structure-centered perspective and utilizing the scene structure characteristics demonstrated by semantics and illumination. We devise a novel approach to reduce over-reliance on local textures, enhancing robustness against missing or interfering patterns. Additionally, we incorporate a semantic expert model as the teacher and construct inter-model feature dependencies via learnable isomorphic graphs to enable aggregation of semantic structural knowledge. Our approach achieves state-of-the-art out-of-distribution monocular depth estimation performance across a range of public adverse scenario datasets. It demonstrates notable scalability and compatibility, without necessitating extensive model engineering. This showcases the potential for customizing models for diverse industrial applications.
Related papers
- Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions [30.148969711689773]
We present a novel approach designed to address the complexities posed by challenging, out-of-distribution data in the single-image depth estimation task.
We systematically generate new, user-defined scenes with a comprehensive set of challenges and associated depth information.
This is achieved by leveraging cutting-edge text-to-image diffusion models with depth-aware control.
arXiv Detail & Related papers (2024-07-23T17:59:59Z) - Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion [21.939618694037108]
Unsupervised monocular depth estimation has received widespread attention because of its capability to train without ground truth.
We employ a well-converging diffusion model among generative networks for unsupervised monocular depth estimation.
This model significantly enriches the model's capacity for learning and interpreting depth distribution.
arXiv Detail & Related papers (2024-06-14T07:31:20Z) - Depth-aware Volume Attention for Texture-less Stereo Matching [67.46404479356896]
We propose a lightweight volume refinement scheme to tackle the texture deterioration in practical outdoor scenarios.
We introduce a depth volume supervised by the ground-truth depth map, capturing the relative hierarchy of image texture.
Local fine structure and context are emphasized to mitigate ambiguity and redundancy during volume aggregation.
arXiv Detail & Related papers (2024-02-14T04:07:44Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Fine-grained Semantics-aware Representation Enhancement for
Self-supervised Monocular Depth Estimation [16.092527463250708]
We propose novel ideas to improve self-supervised monocular depth estimation.
We focus on incorporating implicit semantic knowledge into geometric representation enhancement.
We evaluate our methods on the KITTI dataset and demonstrate that our method outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-08-19T17:50:51Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z) - Self-Supervised Joint Learning Framework of Depth Estimation via
Implicit Cues [24.743099160992937]
We propose a novel self-supervised joint learning framework for depth estimation.
The proposed framework outperforms the state-of-the-art(SOTA) on KITTI and Make3D datasets.
arXiv Detail & Related papers (2020-06-17T13:56:59Z) - DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised
Representation Learning [65.94499390875046]
DeFeat-Net is an approach to simultaneously learn a cross-domain dense feature representation.
Our technique is able to outperform the current state-of-the-art with around 10% reduction in all error measures.
arXiv Detail & Related papers (2020-03-30T13:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.