Monocular Depth Estimation Using Cues Inspired by Biological Vision
Systems
- URL: http://arxiv.org/abs/2204.10384v1
- Date: Thu, 21 Apr 2022 19:42:36 GMT
- Title: Monocular Depth Estimation Using Cues Inspired by Biological Vision
Systems
- Authors: Dylan Auty, Krystian Mikolajczyk
- Abstract summary: Monocular depth estimation (MDE) aims to transform an RGB image of a scene into a pixelwise depth map from the same camera view.
Part of the MDE task is to learn which visual cues in the image can be used for depth estimation, and how.
We demonstrate that explicitly injecting visual cue information into the model is beneficial for depth estimation.
- Score: 22.539300644593936
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Monocular depth estimation (MDE) aims to transform an RGB image of a scene
into a pixelwise depth map from the same camera view. It is fundamentally
ill-posed due to missing information: any single image can have been taken from
many possible 3D scenes. Part of the MDE task is, therefore, to learn which
visual cues in the image can be used for depth estimation, and how. With
training data limited by cost of annotation or network capacity limited by
computational power, this is challenging. In this work we demonstrate that
explicitly injecting visual cue information into the model is beneficial for
depth estimation. Following research into biological vision systems, we focus
on semantic information and prior knowledge of object sizes and their
relations, to emulate the biological cues of relative size, familiar size, and
absolute size. We use state-of-the-art semantic and instance segmentation
models to provide external information, and exploit language embeddings to
encode relational information between classes. We also provide a prior on the
average real-world size of objects. This external information overcomes the
limitation in data availability, and ensures that the limited capacity of a
given network is focused on known-helpful cues, therefore improving
performance. We experimentally validate our hypothesis and evaluate the
proposed model on the widely used NYUD2 indoor depth estimation benchmark. The
results show improvements in depth prediction when the semantic information,
size prior and instance size are explicitly provided along with the RGB images,
and our method can be easily adapted to any depth estimation system.
Related papers
- ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth.
Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module.
Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z) - Depth Insight -- Contribution of Different Features to Indoor
Single-image Depth Estimation [8.712751056826283]
We quantify the relative contributions of the known cues of depth in a monocular depth estimation setting.
Our work uses feature extraction techniques to relate the single features of shape, texture, colour and saturation, taken in isolation, to predict depth.
arXiv Detail & Related papers (2023-11-16T17:38:21Z) - Self-Supervised Learning based Depth Estimation from Monocular Images [0.0]
The goal of Monocular Depth Estimation is to predict the depth map, given a 2D monocular RGB image as input.
We plan to do intrinsic camera parameters during training and apply weather augmentations to further generalize our model.
arXiv Detail & Related papers (2023-04-14T07:14:08Z) - Visual Attention-based Self-supervised Absolute Depth Estimation using
Geometric Priors in Autonomous Driving [8.045833295463094]
We introduce a fully Visual Attention-based Depth (VADepth) network, where spatial attention and channel attention are applied to all stages.
By continuously extracting the dependencies of features along the spatial and channel dimensions over a long distance, VADepth network can effectively preserve important details.
Experimental results on the KITTI dataset show that this architecture achieves the state-of-the-art performance.
arXiv Detail & Related papers (2022-05-18T08:01:38Z) - Self-Supervised Monocular Depth Estimation with Internal Feature Fusion [12.874712571149725]
Self-supervised learning for depth estimation uses geometry in image sequences for supervision.
We propose a novel depth estimation networkDIFFNet, which can make use of semantic information in down and upsampling procedures.
arXiv Detail & Related papers (2021-10-18T17:31:11Z) - Towards Interpretable Deep Networks for Monocular Depth Estimation [78.84690613778739]
We quantify the interpretability of a deep MDE network by the depth selectivity of its hidden units.
We propose a method to train interpretable MDE deep networks without changing their original architectures.
Experimental results demonstrate that our method is able to enhance the interpretability of deep MDE networks.
arXiv Detail & Related papers (2021-08-11T16:43:45Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - DONet: Learning Category-Level 6D Object Pose and Size Estimation from
Depth Observation [53.55300278592281]
We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image.
Our framework makes inferences based on the rich geometric information of the object in the depth channel alone.
Our framework competes with state-of-the-art approaches that require labeled real-world images.
arXiv Detail & Related papers (2021-06-27T10:41:50Z) - S2R-DepthNet: Learning a Generalizable Depth-specific Structural
Representation [63.58891781246175]
Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes.
We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information.
Our S2R-DepthNet can be well generalized to unseen real-world data directly even though it is only trained on synthetic data.
arXiv Detail & Related papers (2021-04-02T03:55:41Z) - Predicting Relative Depth between Objects from Semantic Features [2.127049691404299]
The 3D depth of objects depicted in 2D images is one such feature.
The state of the art in this area are complex Neural Network models trained on stereo image data to predict depth per pixel.
An overall increase of 14% in relative depth accuracy over relative depth computed from the monodepth model derived results is achieved.
arXiv Detail & Related papers (2021-01-12T17:28:23Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.