GCNDepth: Self-supervised Monocular Depth Estimation based on Graph
Convolutional Network
- URL: http://arxiv.org/abs/2112.06782v1
- Date: Mon, 13 Dec 2021 16:46:25 GMT
- Title: GCNDepth: Self-supervised Monocular Depth Estimation based on Graph
Convolutional Network
- Authors: Armin Masoumian, Hatem A. Rashwan, Saddam Abdulwahab, Julian Cristiano
and Domenec Puig
- Abstract summary: This work brings a new solution with a set of improvements, which increase the quantitative and qualitative understanding of depth maps.
A graph convolutional network (GCN) can handle the convolution on non-Euclidean data and it can be applied to irregular image regions within a topological structure.
Our method provided comparable and promising results with a high prediction accuracy of 89% on the publicly KITTI and Make3D datasets.
- Score: 11.332580333969302
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Depth estimation is a challenging task of 3D reconstruction to enhance the
accuracy sensing of environment awareness. This work brings a new solution with
a set of improvements, which increase the quantitative and qualitative
understanding of depth maps compared to existing methods. Recently, a
convolutional neural network (CNN) has demonstrated its extraordinary ability
in estimating depth maps from monocular videos. However, traditional CNN does
not support topological structure and they can work only on regular image
regions with determined size and weights. On the other hand, graph
convolutional networks (GCN) can handle the convolution on non-Euclidean data
and it can be applied to irregular image regions within a topological
structure. Therefore, in this work in order to preserve object geometric
appearances and distributions, we aim at exploiting GCN for a self-supervised
depth estimation model. Our model consists of two parallel auto-encoder
networks: the first is an auto-encoder that will depend on ResNet-50 and
extract the feature from the input image and on multi-scale GCN to estimate the
depth map. In turn, the second network will be used to estimate the ego-motion
vector (i.e., 3D pose) between two consecutive frames based on ResNet-18. Both
the estimated 3D pose and depth map will be used for constructing a target
image. A combination of loss functions related to photometric, projection, and
smoothness is used to cope with bad depth prediction and preserve the
discontinuities of the objects. In particular, our method provided comparable
and promising results with a high prediction accuracy of 89% on the publicly
KITTI and Make3D datasets along with a reduction of 40% in the number of
trainable parameters compared to the state of the art solutions. The source
code is publicly available at https://github.com/ArminMasoumian/GCNDepth.git
Related papers
- Improving 3D Pose Estimation for Sign Language [38.20064386142944]
This work addresses 3D human pose reconstruction in single images.
We present a method that combines Forward Kinematics (FK) with neural networks to ensure a fast and valid prediction of 3D pose.
arXiv Detail & Related papers (2023-08-18T13:05:10Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - DevNet: Self-supervised Monocular Depth Learning via Density Volume
Construction [51.96971077984869]
Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames.
This work proposes Density Volume Construction Network (DevNet), a novel self-supervised monocular depth learning framework.
arXiv Detail & Related papers (2022-09-14T00:08:44Z) - DeepFusion: Real-Time Dense 3D Reconstruction for Monocular SLAM using
Single-View Depth and Gradient Predictions [22.243043857097582]
DeepFusion is capable of producing real-time dense reconstructions on a GPU.
It fuses the output of a semi-dense multiview stereo algorithm with the depth and predictions of a CNN in a probabilistic fashion.
Based on its performance on synthetic and real-world datasets, we demonstrate that DeepFusion is capable of performing at least as well as other comparable systems.
arXiv Detail & Related papers (2022-07-25T14:55:26Z) - MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D
Object Detection [10.377424252002792]
monocular 3D object detection lacks accurate depth recovery ability.
Deep neural network (DNN) enables monocular depth-sensing from high-level learned features.
We propose a joint semantic and geometric cost volume to model the depth error.
arXiv Detail & Related papers (2022-03-16T11:54:10Z) - VR3Dense: Voxel Representation Learning for 3D Object Detection and
Monocular Dense Depth Reconstruction [0.951828574518325]
We introduce a method for jointly training 3D object detection and monocular dense depth reconstruction neural networks.
It takes as inputs, a LiDAR point-cloud, and a single RGB image during inference and produces object pose predictions as well as a densely reconstructed depth map.
While our object detection is trained in a supervised manner, the depth prediction network is trained with both self-supervised and supervised loss functions.
arXiv Detail & Related papers (2021-04-13T04:25:54Z) - GeoNet++: Iterative Geometric Neural Network with Edge-Aware Refinement
for Joint Depth and Surface Normal Estimation [204.13451624763735]
We propose a geometric neural network with edge-aware refinement (GeoNet++) to jointly predict both depth and surface normal maps from a single image.
GeoNet++ effectively predicts depth and surface normals with strong 3D consistency and sharp boundaries.
In contrast to current metrics that focus on evaluating pixel-wise error/accuracy, 3DGM measures whether the predicted depth can reconstruct high-quality 3D surface normals.
arXiv Detail & Related papers (2020-12-13T06:48:01Z) - HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization [83.57863764231655]
We propose the Human Depth Estimation Network (HDNet), an end-to-end framework for absolute root joint localization.
A skeleton-based Graph Neural Network (GNN) is utilized to propagate features among joints.
We evaluate our HDNet on the root joint localization and root-relative 3D pose estimation tasks with two benchmark datasets.
arXiv Detail & Related papers (2020-07-17T12:44:23Z) - Depth Completion Using a View-constrained Deep Prior [73.21559000917554]
Recent work has shown that the structure of convolutional neural networks (CNNs) induces a strong prior that favors natural images.
This prior, known as a deep image prior (DIP), is an effective regularizer in inverse problems such as image denoising and inpainting.
We extend the concept of the DIP to depth images. Given color images and noisy and incomplete target depth maps, we reconstruct a depth map restored by virtue of using the CNN network structure as a prior.
arXiv Detail & Related papers (2020-01-21T21:56:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.