Absolute distance prediction based on deep learning object detection and
monocular depth estimation models
- URL: http://arxiv.org/abs/2111.01715v1
- Date: Tue, 2 Nov 2021 16:29:13 GMT
- Title: Absolute distance prediction based on deep learning object detection and
monocular depth estimation models
- Authors: Armin Masoumian, David G. F. Marei, Saddam Abdulwahab, Julian
Cristiano, Domenec Puig and Hatem A. Rashwan
- Abstract summary: This paper presents a deep learning framework that consists of two deep networks for depth estimation and object detection using a single image.
The proposed framework is promising and it yields an accuracy of 96% with RMSE of 0.203 of the correct absolute distance.
- Score: 10.563101152143817
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Determining the distance between the objects in a scene and the camera sensor
from 2D images is feasible by estimating depth images using stereo cameras or
3D cameras. The outcome of depth estimation is relative distances that can be
used to calculate absolute distances to be applicable in reality. However,
distance estimation is very challenging using 2D monocular cameras. This paper
presents a deep learning framework that consists of two deep networks for depth
estimation and object detection using a single image. Firstly, objects in the
scene are detected and localized using the You Only Look Once (YOLOv5) network.
In parallel, the estimated depth image is computed using a deep autoencoder
network to detect the relative distances. The proposed object detection based
YOLO was trained using a supervised learning technique, in turn, the network of
depth estimation was self-supervised training. The presented distance
estimation framework was evaluated on real images of outdoor scenes. The
achieved results show that the proposed framework is promising and it yields an
accuracy of 96% with RMSE of 0.203 of the correct absolute distance.
Related papers
- ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth.
Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module.
Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z) - Blur aware metric depth estimation with multi-focus plenoptic cameras [8.508198765617196]
We present a new metric depth estimation algorithm using only raw images from a multi-focus plenoptic camera.
The proposed approach is especially suited for the multi-focus configuration where several micro-lenses with different focal lengths are used.
arXiv Detail & Related papers (2023-08-08T13:38:50Z) - Boosting Monocular 3D Object Detection with Object-Centric Auxiliary
Depth Supervision [13.593246617391266]
We propose a method to boost the RGB image-based 3D detector by jointly training the detection network with a depth prediction loss analogous to the depth estimation task.
Our novel object-centric depth prediction loss focuses on depth around foreground objects, which is important for 3D object detection.
Our depth regression model is further trained to predict the uncertainty of depth to represent the 3D confidence of objects.
arXiv Detail & Related papers (2022-10-29T11:32:28Z) - Uncertainty Guided Depth Fusion for Spike Camera [49.41822923588663]
We propose a novel Uncertainty-Guided Depth Fusion (UGDF) framework to fuse predictions of monocular and stereo depth estimation networks for spike camera.
Our framework is motivated by the fact that stereo spike depth estimation achieves better results at close range.
In order to demonstrate the advantage of spike depth estimation over traditional camera depth estimation, we contribute a spike-depth dataset named CitySpike20K.
arXiv Detail & Related papers (2022-08-26T13:04:01Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - SurroundDepth: Entangling Surrounding Views for Self-Supervised
Multi-Camera Depth Estimation [101.55622133406446]
We propose a SurroundDepth method to incorporate the information from multiple surrounding views to predict depth maps across cameras.
Specifically, we employ a joint network to process all the surrounding views and propose a cross-view transformer to effectively fuse the information from multiple views.
In experiments, our method achieves the state-of-the-art performance on the challenging multi-camera depth estimation datasets.
arXiv Detail & Related papers (2022-04-07T17:58:47Z) - Anchor Distance for 3D Multi-Object Distance Estimation from 2D Single
Shot [15.815583594196488]
We present a real time approach for estimating the distances to multiple objects in a scene using only a single-shot image.
We let the predictors catch the distance prior using anchor distance and train the network based on the distance.
The proposed method achieves about 30 FPS speed, and shows the lowest RMSE compared to the existing methods.
arXiv Detail & Related papers (2021-01-25T20:33:05Z) - Multi-Modal Depth Estimation Using Convolutional Neural Networks [0.8701566919381223]
This paper addresses the problem of dense depth predictions from sparse distance sensor data and a single camera image on challenging weather conditions.
It explores the significance of different sensor modalities such as camera, Radar, and Lidar for estimating depth by applying Deep Learning approaches.
arXiv Detail & Related papers (2020-12-17T15:31:49Z) - Self-Attention Dense Depth Estimation Network for Unrectified Video
Sequences [6.821598757786515]
LiDAR and radar sensors are the hardware solution for real-time depth estimation.
Deep learning based self-supervised depth estimation methods have shown promising results.
We propose a self-attention based depth and ego-motion network for unrectified images.
arXiv Detail & Related papers (2020-05-28T21:53:53Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z) - Don't Forget The Past: Recurrent Depth Estimation from Monocular Video [92.84498980104424]
We put three different types of depth estimation into a common framework.
Our method produces a time series of depth maps.
It can be applied to monocular videos only or be combined with different types of sparse depth patterns.
arXiv Detail & Related papers (2020-01-08T16:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.