SfM-TTR: Using Structure from Motion for Test-Time Refinement of
Single-View Depth Networks
- URL: http://arxiv.org/abs/2211.13551v2
- Date: Fri, 31 Mar 2023 11:37:12 GMT
- Title: SfM-TTR: Using Structure from Motion for Test-Time Refinement of
Single-View Depth Networks
- Authors: Sergio Izquierdo, Javier Civera
- Abstract summary: We propose a novel test-time refinement (TTR) method, denoted as SfM-TTR, to boost the performance of single-view depth networks at test time.
Specifically, and differently from the state of the art, we use sparse SfM point clouds as test-time self-supervisory signal.
Our results show how the addition of SfM-TTR to several state-of-the-art self-supervised and supervised networks improves significantly their performance.
- Score: 13.249453757295086
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating a dense depth map from a single view is geometrically ill-posed,
and state-of-the-art methods rely on learning depth's relation with visual
appearance using deep neural networks. On the other hand, Structure from Motion
(SfM) leverages multi-view constraints to produce very accurate but sparse
maps, as matching across images is typically limited by locally discriminative
texture. In this work, we combine the strengths of both approaches by proposing
a novel test-time refinement (TTR) method, denoted as SfM-TTR, that boosts the
performance of single-view depth networks at test time using SfM multi-view
cues. Specifically, and differently from the state of the art, we use sparse
SfM point clouds as test-time self-supervisory signal, fine-tuning the network
encoder to learn a better representation of the test scene. Our results show
how the addition of SfM-TTR to several state-of-the-art self-supervised and
supervised networks improves significantly their performance, outperforming
previous TTR baselines mainly based on photometric multi-view consistency. The
code is available at https://github.com/serizba/SfM-TTR.
Related papers
- MEDeA: Multi-view Efficient Depth Adjustment [45.90423821963144]
MEDeA is an efficient multi-view test-time depth adjustment method that is an order of magnitude faster than existing test-time approaches.
Our method sets a new state-of-the-art on TUM RGB-D, 7Scenes, and ScanNet benchmarks and successfully handles smartphone-captured data from ARKitScenes dataset.
arXiv Detail & Related papers (2024-06-17T19:39:13Z) - Pushing the Efficiency Limit Using Structured Sparse Convolutions [82.31130122200578]
We propose Structured Sparse Convolution (SSC), which leverages the inherent structure in images to reduce the parameters in the convolutional filter.
We show that SSC is a generalization of commonly used layers (depthwise, groupwise and pointwise convolution) in efficient architectures''
Architectures based on SSC achieve state-of-the-art performance compared to baselines on CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet classification benchmarks.
arXiv Detail & Related papers (2022-10-23T18:37:22Z) - Self-distilled Feature Aggregation for Self-supervised Monocular Depth
Estimation [11.929584800629673]
We propose the Self-Distilled Feature Aggregation (SDFA) module for simultaneously aggregating a pair of low-scale and high-scale features.
We propose an SDFA-based network for self-supervised monocular depth estimation, and design a self-distilled training strategy to train the proposed network.
Experimental results on the KITTI dataset demonstrate that the proposed method outperforms the comparative state-of-the-art methods in most cases.
arXiv Detail & Related papers (2022-09-15T07:00:52Z) - TANDEM: Tracking and Dense Mapping in Real-time using Deep Multi-view
Stereo [55.30992853477754]
We present TANDEM, a real-time monocular tracking and dense framework.
For pose estimation, TANDEM performs photometric bundle adjustment based on a sliding window of alignments.
TANDEM shows state-of-the-art real-time 3D reconstruction performance.
arXiv Detail & Related papers (2021-11-14T19:01:02Z) - VolumeFusion: Deep Depth Fusion for 3D Scene Reconstruction [71.83308989022635]
In this paper, we advocate that replicating the traditional two stages framework with deep neural networks improves both the interpretability and the accuracy of the results.
Our network operates in two steps: 1) the local computation of the local depth maps with a deep MVS technique, and, 2) the depth maps and images' features fusion to build a single TSDF volume.
In order to improve the matching performance between images acquired from very different viewpoints, we introduce a rotation-invariant 3D convolution kernel called PosedConv.
arXiv Detail & Related papers (2021-08-19T11:33:58Z) - Monocular Depth Parameterizing Networks [15.791732557395552]
We propose a network structure that provides a parameterization of a set of depth maps with feasible shapes.
This allows us to search the shapes for a photo consistent solution with respect to other images.
Our experimental evaluation shows that our method generates more accurate depth maps and generalizes better than competing state-of-the-art approaches.
arXiv Detail & Related papers (2020-12-21T13:02:41Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Adaptive Context-Aware Multi-Modal Network for Depth Completion [107.15344488719322]
We propose to adopt the graph propagation to capture the observed spatial contexts.
We then apply the attention mechanism on the propagation, which encourages the network to model the contextual information adaptively.
Finally, we introduce the symmetric gated fusion strategy to exploit the extracted multi-modal features effectively.
Our model, named Adaptive Context-Aware Multi-Modal Network (ACMNet), achieves the state-of-the-art performance on two benchmarks.
arXiv Detail & Related papers (2020-08-25T06:00:06Z) - MSDPN: Monocular Depth Prediction with Partial Laser Observation using
Multi-stage Neural Networks [1.1602089225841632]
We propose a deep-learning-based multi-stage network architecture called Multi-Stage Depth Prediction Network (MSDPN)
MSDPN is proposed to predict a dense depth map using a 2D LiDAR and a monocular camera.
As verified experimentally, our network yields promising performance against state-of-the-art methods.
arXiv Detail & Related papers (2020-08-04T08:27:40Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.