Related papers: Monocular Depth Estimation Using Multi Scale Neural Network And Feature Fusion

Monocular Depth Estimation Using Multi Scale Neural Network And Feature Fusion

URL: http://arxiv.org/abs/2009.09934v1
Date: Fri, 11 Sep 2020 18:08:52 GMT
Title: Monocular Depth Estimation Using Multi Scale Neural Network And Feature Fusion
Authors: Abhinav Sagar
Abstract summary: Our network uses two different blocks, first which uses different filter sizes for convolution and merges all the individual feature maps. The second block uses dilated convolutions in place of fully connected layers thus reducing computations and increasing the receptive field. We train and test our network on Make 3D dataset, NYU Depth V2 dataset and Kitti dataset using standard evaluation metrics for depth estimation comprised of RMSE loss and SILog loss.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Depth estimation from monocular images is a challenging problem in computer vision. In this paper, we tackle this problem using a novel network architecture using multi scale feature fusion. Our network uses two different blocks, first which uses different filter sizes for convolution and merges all the individual feature maps. The second block uses dilated convolutions in place of fully connected layers thus reducing computations and increasing the receptive field. We present a new loss function for training the network which uses a depth regression term, SSIM loss term and a multinomial logistic loss term combined. We train and test our network on Make 3D dataset, NYU Depth V2 dataset and Kitti dataset using standard evaluation metrics for depth estimation comprised of RMSE loss and SILog loss. Our network outperforms previous state of the art methods with lesser parameters.

Related papers

How Learnable Grids Recover Fine Detail in Low Dimensions: A Neural Tangent Kernel Analysis of Multigrid Parametric Encodings [106.3726679697804]
We compare the two most common techniques for mitigating this spectral bias: Fourier feature encodings (FFE) and multigrid parametric encodings (MPE) MPEs are seen as the standard for low dimensional mappings, but MPEs often outperform them and learn representations with higher resolution and finer detail. We prove that MPEs improve a network's performance through the structure of their grid and not their learnable embedding.
arXiv Detail & Related papers (2025-04-18T02:18:08Z)
Active search and coverage using point-cloud reinforcement learning [50.741409008225766]
This paper presents an end-to-end deep reinforcement learning solution for target search and coverage. We show that deep hierarchical feature learning works for RL and that by using farthest point sampling (FPS) we can reduce the amount of points. We also show that multi-head attention for point-clouds helps to learn the agent faster but converges to the same outcome.
arXiv Detail & Related papers (2023-12-18T18:16:30Z)
DevNet: Self-supervised Monocular Depth Learning via Density Volume Construction [51.96971077984869]
Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames. This work proposes Density Volume Construction Network (DevNet), a novel self-supervised monocular depth learning framework.
arXiv Detail & Related papers (2022-09-14T00:08:44Z)
GCNDepth: Self-supervised Monocular Depth Estimation based on Graph Convolutional Network [11.332580333969302]
This work brings a new solution with a set of improvements, which increase the quantitative and qualitative understanding of depth maps. A graph convolutional network (GCN) can handle the convolution on non-Euclidean data and it can be applied to irregular image regions within a topological structure. Our method provided comparable and promising results with a high prediction accuracy of 89% on the publicly KITTI and Make3D datasets.
arXiv Detail & Related papers (2021-12-13T16:46:25Z)
Dilated Fully Convolutional Neural Network for Depth Estimation from a Single Image [1.0131895986034314]
We present an advanced Dilated Fully Convolutional Neural Network to address the deficiencies of traditional CNNs. Taking advantages of the exponential expansion of the receptive field in dilated convolutions, our model can minimize the loss of resolution. We show experimentally on NYU Depth V2 datasets that the depth prediction obtained from our model is considerably closer to ground truth than that from traditional CNNs techniques.
arXiv Detail & Related papers (2021-03-12T23:19:32Z)
PLADE-Net: Towards Pixel-Level Accuracy for Self-Supervised Single-View Depth Estimation with Neural Positional Encoding and Distilled Matting Loss [49.66736599668501]
We propose a self-supervised single-view pixel-level accurate depth estimation network, called PLADE-Net. Our method shows unprecedented accuracy levels, exceeding 95% in terms of the $delta1$ metric on the KITTI dataset.
arXiv Detail & Related papers (2021-03-12T15:54:46Z)
Topological obstructions in neural networks learning [67.8848058842671]
We study global properties of the loss gradient function flow. We use topological data analysis of the loss function and its Morse complex to relate local behavior along gradient trajectories with global properties of the loss surface.
arXiv Detail & Related papers (2020-12-31T18:53:25Z)
Dense U-net for super-resolution with shuffle pooling layer [4.397981844057195]
Recent researches have achieved great progress on single image super-resolution(SISR) In these method, the high resolution input image is down-scaled to low resolution space using a single filter, commonly max-pooling, before feature extraction. We demonstrate that this is sub-optimal and causes information loss. In this work, we proposed a state-of-the-art convolutional neural network method called Dense U-net with shuffle pooling.
arXiv Detail & Related papers (2020-11-11T00:59:43Z)
Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts. We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively. Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively. Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z)
Bayesian Multi Scale Neural Network for Crowd Counting [0.0]
We propose a new network which uses a ResNet based feature extractor, downsampling block which uses dilated convolutions and upsampling block using transposed convolutions. We present a novel aggregation module which makes our network robust to the perspective view problem.
arXiv Detail & Related papers (2020-07-11T21:43:20Z)
When Residual Learning Meets Dense Aggregation: Rethinking the Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations. Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z)
Towards Better Generalization: Joint Depth-Pose Learning without PoseNet [36.414471128890284]
We tackle the essential problem of scale inconsistency for self-supervised joint depth-pose learning. Most existing methods assume that a consistent scale of depth and pose can be learned across all input samples. We propose a novel system that explicitly disentangles scale from the network estimation.
arXiv Detail & Related papers (2020-04-03T00:28:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.