Predicting Relative Depth between Objects from Semantic Features
- URL: http://arxiv.org/abs/2101.04626v1
- Date: Tue, 12 Jan 2021 17:28:23 GMT
- Title: Predicting Relative Depth between Objects from Semantic Features
- Authors: Stefan Cassar, Adrian Muscat, Dylan Seychell
- Abstract summary: The 3D depth of objects depicted in 2D images is one such feature.
The state of the art in this area are complex Neural Network models trained on stereo image data to predict depth per pixel.
An overall increase of 14% in relative depth accuracy over relative depth computed from the monodepth model derived results is achieved.
- Score: 2.127049691404299
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Vision and language tasks such as Visual Relation Detection and Visual
Question Answering benefit from semantic features that afford proper grounding
of language. The 3D depth of objects depicted in 2D images is one such feature.
However it is very difficult to obtain accurate depth information without
learning the appropriate features, which are scene dependent. The state of the
art in this area are complex Neural Network models trained on stereo image data
to predict depth per pixel. Fortunately, in some tasks, its only the relative
depth between objects that is required. In this paper the extent to which
semantic features can predict course relative depth is investigated. The
problem is casted as a classification one and geometrical features based on
object bounding boxes, object labels and scene attributes are computed and used
as inputs to pattern recognition models to predict relative depth. i.e behind,
in-front and neutral. The results are compared to those obtained from averaging
the output of the monodepth neural network model, which represents the
state-of-the art. An overall increase of 14% in relative depth accuracy over
relative depth computed from the monodepth model derived results is achieved.
Related papers
- Understanding Depth Map Progressively: Adaptive Distance Interval
Separation for Monocular 3d Object Detection [38.96129204108353]
Several monocular 3D detection techniques rely on auxiliary depth maps from the depth estimation task.
We propose a framework named the Adaptive Distance Interval Separation Network (ADISN) that adopts a novel perspective on understanding depth maps.
arXiv Detail & Related papers (2023-06-19T13:32:53Z) - Source-free Depth for Object Pop-out [113.24407776545652]
Modern learning-based methods offer promising depth maps by inference in the wild.
We adapt such depth inference models for object segmentation using the objects' "pop-out" prior in 3D.
Our experiments on eight datasets consistently demonstrate the benefit of our method in terms of both performance and generalizability.
arXiv Detail & Related papers (2022-12-10T21:57:11Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection [34.01288862240829]
Monocular 3D detection has drawn much attention from the community due to its low cost and setup simplicity.
The most challenging sub-task lies in the instance depth estimation.
We propose to reformulate the instance depth to the combination of the instance visual surface depth and the instance attribute depth.
arXiv Detail & Related papers (2022-07-18T11:49:18Z) - Monocular Depth Estimation Using Cues Inspired by Biological Vision
Systems [22.539300644593936]
Monocular depth estimation (MDE) aims to transform an RGB image of a scene into a pixelwise depth map from the same camera view.
Part of the MDE task is to learn which visual cues in the image can be used for depth estimation, and how.
We demonstrate that explicitly injecting visual cue information into the model is beneficial for depth estimation.
arXiv Detail & Related papers (2022-04-21T19:42:36Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - S2R-DepthNet: Learning a Generalizable Depth-specific Structural
Representation [63.58891781246175]
Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes.
We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information.
Our S2R-DepthNet can be well generalized to unseen real-world data directly even though it is only trained on synthetic data.
arXiv Detail & Related papers (2021-04-02T03:55:41Z) - Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust
Depth Prediction [87.08227378010874]
We show the importance of the high-order 3D geometric constraints for depth prediction.
By designing a loss term that enforces a simple geometric constraint, we significantly improve the accuracy and robustness of monocular depth estimation.
We show state-of-the-art results of learning metric depth on NYU Depth-V2 and KITTI.
arXiv Detail & Related papers (2021-03-07T00:08:21Z) - Learning a Geometric Representation for Data-Efficient Depth Estimation
via Gradient Field and Contrastive Loss [29.798579906253696]
We propose a gradient-based self-supervised learning algorithm with momentum contrastive loss to help ConvNets extract the geometric information with unlabeled images.
Our method outperforms the previous state-of-the-art self-supervised learning algorithms and shows the efficiency of labeled data in triple.
arXiv Detail & Related papers (2020-11-06T06:47:19Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.