Related papers: Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions

Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions

URL: http://arxiv.org/abs/2407.16698v1
Date: Tue, 23 Jul 2024 17:59:59 GMT
Title: Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions
Authors: Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi,
Abstract summary: We present a novel approach designed to address the complexities posed by challenging, out-of-distribution data in the single-image depth estimation task. We systematically generate new, user-defined scenes with a comprehensive set of challenges and associated depth information. This is achieved by leveraging cutting-edge text-to-image diffusion models with depth-aware control.
Score: 30.148969711689773
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel approach designed to address the complexities posed by challenging, out-of-distribution data in the single-image depth estimation task. Starting with images that facilitate depth prediction due to the absence of unfavorable factors, we systematically generate new, user-defined scenes with a comprehensive set of challenges and associated depth information. This is achieved by leveraging cutting-edge text-to-image diffusion models with depth-aware control, known for synthesizing high-quality image content from textual prompts while preserving the coherence of 3D structure between generated and source imagery. Subsequent fine-tuning of any monocular depth network is carried out through a self-distillation protocol that takes into account images generated using our strategy and its own depth predictions on simple, unchallenging scenes. Experiments on benchmarks tailored for our purposes demonstrate the effectiveness and versatility of our proposal.

Related papers

Training Self-Supervised Depth Completion Using Sparse Measurements and a Single Image [2.3874115898130865]
We propose a novel self-supervised depth completion paradigm that requires only sparse depth measurements and their corresponding image for training.<n>By leveraging the characteristics of depth distribution, we design novel loss functions that effectively propagate depth information from observed points to unobserved regions.
arXiv Detail & Related papers (2025-07-20T07:24:09Z)
Leveraging Stable Diffusion for Monocular Depth Estimation via Image Semantic Encoding [1.0445560141983634]
We propose a novel image-based semantic embedding that extracts contextual information directly from visual features. Our method achieves performance comparable to state-of-the-art models while addressing the shortcomings of CLIP embeddings in handling outdoor scenes.
arXiv Detail & Related papers (2025-02-01T15:37:22Z)
Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation. Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model. Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z)
FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior [50.0535198082903]
We offer a novel approach to image composition, which integrates multiple input images into a single, coherent image. We showcase the potential of utilizing the powerful generative prior inherent in large-scale pre-trained diffusion models to accomplish generic image composition.
arXiv Detail & Related papers (2024-07-06T03:35:43Z)
Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion [21.939618694037108]
Unsupervised monocular depth estimation has received widespread attention because of its capability to train without ground truth. We employ a well-converging diffusion model among generative networks for unsupervised monocular depth estimation. This model significantly enriches the model's capacity for learning and interpreting depth distribution.
arXiv Detail & Related papers (2024-06-14T07:31:20Z)
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS) A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z)
Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework. We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas. With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z)
Depth-aware Volume Attention for Texture-less Stereo Matching [67.46404479356896]
We propose a lightweight volume refinement scheme to tackle the texture deterioration in practical outdoor scenarios. We introduce a depth volume supervised by the ground-truth depth map, capturing the relative hierarchy of image texture. Local fine structure and context are emphasized to mitigate ambiguity and redundancy during volume aggregation.
arXiv Detail & Related papers (2024-02-14T04:07:44Z)
Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries. We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework. We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z)
RigNet++: Semantic Assisted Repetitive Image Guided Network for Depth Completion [31.70022495622075]
We explore a repetitive design in our image guided network to gradually and sufficiently recover depth values. In the former branch, we design a dense repetitive hourglass network (DRHN) to extract discriminative image features of complex environments. In the latter branch, we present a repetitive guidance (RG) module based on dynamic convolution, in which an efficient convolution factorization is proposed to reduce the complexity. In addition, we propose a region-aware spatial propagation network (RASPN) for further depth refinement based on the semantic prior constraint.
arXiv Detail & Related papers (2023-09-01T09:11:20Z)
RigNet: Repetitive Image Guided Network for Depth Completion [20.66405067066299]
Recent approaches mainly focus on image guided learning to predict dense results. blurry image guidance and object structures in depth still impede the performance of image guided frameworks. We explore a repetitive design in our image guided network to sufficiently and gradually recover depth values. Our method achieves state-of-the-art result on the NYUv2 dataset and ranks 1st on the KITTI benchmark at the time of submission.
arXiv Detail & Related papers (2021-07-29T08:00:33Z)
Self-Guided Instance-Aware Network for Depth Completion and Enhancement [6.319531161477912]
Existing methods directly interpolate the missing depth measurements based on pixel-wise image content and the corresponding neighboring depth values. We propose a novel self-guided instance-aware network (SG-IANet) that utilize self-guided mechanism to extract instance-level features that is needed for depth restoration.
arXiv Detail & Related papers (2021-05-25T19:41:38Z)
Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection [86.25022248968908]
We learn context- and depth-aware feature representation to solve the problem of monocular 3D object detection. We show state-of-the-art results among the monocular-based approaches on the KITTI benchmark dataset.
arXiv Detail & Related papers (2021-03-30T16:20:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.