A Novel 3D-UNet Deep Learning Framework Based on High-Dimensional
Bilateral Grid for Edge Consistent Single Image Depth Estimation
- URL: http://arxiv.org/abs/2105.10129v1
- Date: Fri, 21 May 2021 04:53:14 GMT
- Title: A Novel 3D-UNet Deep Learning Framework Based on High-Dimensional
Bilateral Grid for Edge Consistent Single Image Depth Estimation
- Authors: Mansi Sharma, Abheesht Sharma, Kadvekar Rohit Tushar, Avinash Panneer
- Abstract summary: Bilateral Grid based 3D convolutional neural network, dubbed as 3DBG-UNet, parameterizes high dimensional feature space by encoding compact 3D bilateral grids with UNets.
Another novel 3DBGES-UNet model is introduced that integrate 3DBG-UNet for inferring an accurate depth map given a single color view.
- Score: 0.45880283710344055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The task of predicting smooth and edge-consistent depth maps is notoriously
difficult for single image depth estimation. This paper proposes a novel
Bilateral Grid based 3D convolutional neural network, dubbed as 3DBG-UNet, that
parameterizes high dimensional feature space by encoding compact 3D bilateral
grids with UNets and infers sharp geometric layout of the scene. Further,
another novel 3DBGES-UNet model is introduced that integrate 3DBG-UNet for
inferring an accurate depth map given a single color view. The 3DBGES-UNet
concatenates 3DBG-UNet geometry map with the inception network edge
accentuation map and a spatial object's boundary map obtained by leveraging
semantic segmentation and train the UNet model with ResNet backbone. Both
models are designed with a particular attention to explicitly account for edges
or minute details. Preserving sharp discontinuities at depth edges is critical
for many applications such as realistic integration of virtual objects in AR
video or occlusion-aware view synthesis for 3D display applications.The
proposed depth prediction network achieves state-of-the-art performance in both
qualitative and quantitative evaluations on the challenging NYUv2-Depth data.
The code and corresponding pre-trained weights will be made publicly available.
Related papers
- GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.
Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - GraphCSPN: Geometry-Aware Depth Completion via Dynamic GCNs [49.55919802779889]
We propose a Graph Convolution based Spatial Propagation Network (GraphCSPN) as a general approach for depth completion.
In this work, we leverage convolution neural networks as well as graph neural networks in a complementary way for geometric representation learning.
Our method achieves the state-of-the-art performance, especially when compared in the case of using only a few propagation steps.
arXiv Detail & Related papers (2022-10-19T17:56:03Z) - Semantic Dense Reconstruction with Consistent Scene Segments [33.0310121044956]
A method for dense semantic 3D scene reconstruction from an RGB-D sequence is proposed to solve high-level scene understanding tasks.
First, each RGB-D pair is consistently segmented into 2D semantic maps based on a camera tracking backbone.
A dense 3D mesh model of an unknown environment is incrementally generated from the input RGB-D sequence.
arXiv Detail & Related papers (2021-09-30T03:01:17Z) - Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based
Perception [122.53774221136193]
State-of-the-art methods for driving-scene LiDAR-based perception often project the point clouds to 2D space and then process them via 2D convolution.
A natural remedy is to utilize the 3D voxelization and 3D convolution network.
We propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern.
arXiv Detail & Related papers (2021-09-12T06:25:11Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Learning Joint 2D-3D Representations for Depth Completion [90.62843376586216]
We design a simple yet effective neural network block that learns to extract joint 2D and 3D features.
Specifically, the block consists of two domain-specific sub-networks that apply 2D convolution on image pixels and continuous convolution on 3D points.
arXiv Detail & Related papers (2020-12-22T22:58:29Z) - Exploring Deep 3D Spatial Encodings for Large-Scale 3D Scene
Understanding [19.134536179555102]
We propose an alternative approach to overcome the limitations of CNN based approaches by encoding the spatial features of raw 3D point clouds into undirected graph models.
The proposed method achieves on par state-of-the-art accuracy with improved training time and model stability thus indicating strong potential for further research.
arXiv Detail & Related papers (2020-11-29T12:56:19Z) - Pointwise Attention-Based Atrous Convolutional Neural Networks [15.499267533387039]
A pointwise attention-based atrous convolutional neural network architecture is proposed to efficiently deal with a large number of points.
The proposed model has been evaluated on the two most important 3D point cloud datasets for the 3D semantic segmentation task.
It achieves a reasonable performance compared to state-of-the-art models in terms of accuracy, with a much smaller number of parameters.
arXiv Detail & Related papers (2019-12-27T13:12:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.