Understanding Depth Map Progressively: Adaptive Distance Interval
Separation for Monocular 3d Object Detection
- URL: http://arxiv.org/abs/2306.10921v1
- Date: Mon, 19 Jun 2023 13:32:53 GMT
- Title: Understanding Depth Map Progressively: Adaptive Distance Interval
Separation for Monocular 3d Object Detection
- Authors: Xianhui Cheng, Shoumeng Qiu, Zhikang Zou, Jian Pu and Xiangyang Xue
- Abstract summary: Several monocular 3D detection techniques rely on auxiliary depth maps from the depth estimation task.
We propose a framework named the Adaptive Distance Interval Separation Network (ADISN) that adopts a novel perspective on understanding depth maps.
- Score: 38.96129204108353
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monocular 3D object detection aims to locate objects in different scenes with
just a single image. Due to the absence of depth information, several monocular
3D detection techniques have emerged that rely on auxiliary depth maps from the
depth estimation task. There are multiple approaches to understanding the
representation of depth maps, including treating them as pseudo-LiDAR point
clouds, leveraging implicit end-to-end learning of depth information, or
considering them as an image input. However, these methods have certain
drawbacks, such as their reliance on the accuracy of estimated depth maps and
suboptimal utilization of depth maps due to their image-based nature. While
LiDAR-based methods and convolutional neural networks (CNNs) can be utilized
for pseudo point clouds and depth maps, respectively, it is always an
alternative. In this paper, we propose a framework named the Adaptive Distance
Interval Separation Network (ADISN) that adopts a novel perspective on
understanding depth maps, as a form that lies between LiDAR and images. We
utilize an adaptive separation approach that partitions the depth map into
various subgraphs based on distance and treats each of these subgraphs as an
individual image for feature extraction. After adaptive separations, each
subgraph solely contains pixels within a learned interval range. If there is a
truncated object within this range, an evident curved edge will appear, which
we can leverage for texture extraction using CNNs to obtain rich depth
information in pixels. Meanwhile, to mitigate the inaccuracy of depth
estimation, we designed an uncertainty module. To take advantage of both images
and depth maps, we use different branches to learn localization detection tasks
and appearance tasks separately.
Related papers
- MonoCD: Monocular 3D Object Detection with Complementary Depths [9.186673054867866]
Depth estimation is an essential but challenging subtask of monocular 3D object detection.
We propose to increase the complementarity of depths with two novel designs.
Experiments on the KITTI benchmark demonstrate that our method achieves state-of-the-art performance without introducing extra data.
arXiv Detail & Related papers (2024-04-04T03:30:49Z) - Facial Depth and Normal Estimation using Single Dual-Pixel Camera [81.02680586859105]
We introduce a DP-oriented Depth/Normal network that reconstructs the 3D facial geometry.
It contains the corresponding ground-truth 3D models including depth map and surface normal in metric scale.
It achieves state-of-the-art performances over recent DP-based depth/normal estimation methods.
arXiv Detail & Related papers (2021-11-25T05:59:27Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - Predicting Relative Depth between Objects from Semantic Features [2.127049691404299]
The 3D depth of objects depicted in 2D images is one such feature.
The state of the art in this area are complex Neural Network models trained on stereo image data to predict depth per pixel.
An overall increase of 14% in relative depth accuracy over relative depth computed from the monodepth model derived results is achieved.
arXiv Detail & Related papers (2021-01-12T17:28:23Z) - Learning a Geometric Representation for Data-Efficient Depth Estimation
via Gradient Field and Contrastive Loss [29.798579906253696]
We propose a gradient-based self-supervised learning algorithm with momentum contrastive loss to help ConvNets extract the geometric information with unlabeled images.
Our method outperforms the previous state-of-the-art self-supervised learning algorithms and shows the efficiency of labeled data in triple.
arXiv Detail & Related papers (2020-11-06T06:47:19Z) - Towards Dense People Detection with Deep Learning and Depth images [9.376814409561726]
This paper proposes a DNN-based system that detects multiple people from a single depth image.
Our neural network processes a depth image and outputs a likelihood map in image coordinates.
We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training.
arXiv Detail & Related papers (2020-07-14T16:43:02Z) - Occlusion-Aware Depth Estimation with Adaptive Normal Constraints [85.44842683936471]
We present a new learning-based method for multi-frame depth estimation from a color video.
Our method outperforms the state-of-the-art in terms of depth estimation accuracy.
arXiv Detail & Related papers (2020-04-02T07:10:45Z) - Depth Edge Guided CNNs for Sparse Depth Upsampling [18.659087667114274]
Guided sparse depth upsampling aims to upsample an irregularly sampled sparse depth map when an aligned high-resolution color image is given as guidance.
We propose a guided convolutional layer to recover dense depth from sparse and irregular depth image with an depth edge image as guidance.
We conduct comprehensive experiments to verify our method on real-world indoor and synthetic outdoor datasets.
arXiv Detail & Related papers (2020-03-23T08:56:32Z) - Depth Completion Using a View-constrained Deep Prior [73.21559000917554]
Recent work has shown that the structure of convolutional neural networks (CNNs) induces a strong prior that favors natural images.
This prior, known as a deep image prior (DIP), is an effective regularizer in inverse problems such as image denoising and inpainting.
We extend the concept of the DIP to depth images. Given color images and noisy and incomplete target depth maps, we reconstruct a depth map restored by virtue of using the CNN network structure as a prior.
arXiv Detail & Related papers (2020-01-21T21:56:01Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.