Understanding Depth Map Progressively: Adaptive Distance Interval
Separation for Monocular 3d Object Detection
- URL: http://arxiv.org/abs/2306.10921v1
- Date: Mon, 19 Jun 2023 13:32:53 GMT
- Title: Understanding Depth Map Progressively: Adaptive Distance Interval
Separation for Monocular 3d Object Detection
- Authors: Xianhui Cheng, Shoumeng Qiu, Zhikang Zou, Jian Pu and Xiangyang Xue
- Abstract summary: Several monocular 3D detection techniques rely on auxiliary depth maps from the depth estimation task.
We propose a framework named the Adaptive Distance Interval Separation Network (ADISN) that adopts a novel perspective on understanding depth maps.
- Score: 38.96129204108353
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monocular 3D object detection aims to locate objects in different scenes with
just a single image. Due to the absence of depth information, several monocular
3D detection techniques have emerged that rely on auxiliary depth maps from the
depth estimation task. There are multiple approaches to understanding the
representation of depth maps, including treating them as pseudo-LiDAR point
clouds, leveraging implicit end-to-end learning of depth information, or
considering them as an image input. However, these methods have certain
drawbacks, such as their reliance on the accuracy of estimated depth maps and
suboptimal utilization of depth maps due to their image-based nature. While
LiDAR-based methods and convolutional neural networks (CNNs) can be utilized
for pseudo point clouds and depth maps, respectively, it is always an
alternative. In this paper, we propose a framework named the Adaptive Distance
Interval Separation Network (ADISN) that adopts a novel perspective on
understanding depth maps, as a form that lies between LiDAR and images. We
utilize an adaptive separation approach that partitions the depth map into
various subgraphs based on distance and treats each of these subgraphs as an
individual image for feature extraction. After adaptive separations, each
subgraph solely contains pixels within a learned interval range. If there is a
truncated object within this range, an evident curved edge will appear, which
we can leverage for texture extraction using CNNs to obtain rich depth
information in pixels. Meanwhile, to mitigate the inaccuracy of depth
estimation, we designed an uncertainty module. To take advantage of both images
and depth maps, we use different branches to learn localization detection tasks
and appearance tasks separately.
Related papers
- Refinement of Monocular Depth Maps via Multi-View Differentiable Rendering [4.717325308876748]
We present a novel approach to generate view consistent and detailed depth maps from a number of posed images.
We leverage advances in monocular depth estimation, which generate topologically complete, but metrically inaccurate depth maps.
Our method is able to generate dense, detailed, high-quality depth maps, also in challenging indoor scenarios, and outperforms state-of-the-art depth reconstruction approaches.
arXiv Detail & Related papers (2024-10-04T18:50:28Z) - Pixel-Aligned Multi-View Generation with Depth Guided Decoder [86.1813201212539]
We propose a novel method for pixel-level image-to-multi-view generation.
Unlike prior work, we incorporate attention layers across multi-view images in the VAE decoder of a latent video diffusion model.
Our model enables better pixel alignment across multi-view images.
arXiv Detail & Related papers (2024-08-26T04:56:41Z) - Depth-guided Texture Diffusion for Image Semantic Segmentation [47.46257473475867]
We introduce a Depth-guided Texture Diffusion approach that effectively tackles the outlined challenge.
Our method extracts low-level features from edges and textures to create a texture image.
By integrating this enriched depth map with the original RGB image into a joint feature embedding, our method effectively bridges the disparity between the depth map and the image.
arXiv Detail & Related papers (2024-08-17T04:55:03Z) - MonoCD: Monocular 3D Object Detection with Complementary Depths [9.186673054867866]
Depth estimation is an essential but challenging subtask of monocular 3D object detection.
We propose to increase the complementarity of depths with two novel designs.
Experiments on the KITTI benchmark demonstrate that our method achieves state-of-the-art performance without introducing extra data.
arXiv Detail & Related papers (2024-04-04T03:30:49Z) - Learning a Geometric Representation for Data-Efficient Depth Estimation
via Gradient Field and Contrastive Loss [29.798579906253696]
We propose a gradient-based self-supervised learning algorithm with momentum contrastive loss to help ConvNets extract the geometric information with unlabeled images.
Our method outperforms the previous state-of-the-art self-supervised learning algorithms and shows the efficiency of labeled data in triple.
arXiv Detail & Related papers (2020-11-06T06:47:19Z) - Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video.
Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z) - Depth Edge Guided CNNs for Sparse Depth Upsampling [18.659087667114274]
Guided sparse depth upsampling aims to upsample an irregularly sampled sparse depth map when an aligned high-resolution color image is given as guidance.
We propose a guided convolutional layer to recover dense depth from sparse and irregular depth image with an depth edge image as guidance.
We conduct comprehensive experiments to verify our method on real-world indoor and synthetic outdoor datasets.
arXiv Detail & Related papers (2020-03-23T08:56:32Z) - Depth Completion Using a View-constrained Deep Prior [73.21559000917554]
Recent work has shown that the structure of convolutional neural networks (CNNs) induces a strong prior that favors natural images.
This prior, known as a deep image prior (DIP), is an effective regularizer in inverse problems such as image denoising and inpainting.
We extend the concept of the DIP to depth images. Given color images and noisy and incomplete target depth maps, we reconstruct a depth map restored by virtue of using the CNN network structure as a prior.
arXiv Detail & Related papers (2020-01-21T21:56:01Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.