Digging into Uncertainty in Self-supervised Multi-view Stereo
- URL: http://arxiv.org/abs/2108.12966v1
- Date: Mon, 30 Aug 2021 02:53:08 GMT
- Title: Digging into Uncertainty in Self-supervised Multi-view Stereo
- Authors: Hongbin Xu, Zhipeng Zhou, Yali Wang, Wenxiong Kang, Baigui Sun, Hao
Li, Yu Qiao
- Abstract summary: We propose a novel Uncertainty reduction Multi-view Stereo (UMVS) framework for self-supervised learning.
Our framework achieves the best performance among unsupervised MVS methods, with competitive performance with its supervised opponents.
- Score: 57.04768354383339
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised Multi-view stereo (MVS) with a pretext task of image
reconstruction has achieved significant progress recently. However, previous
methods are built upon intuitions, lacking comprehensive explanations about the
effectiveness of the pretext task in self-supervised MVS. To this end, we
propose to estimate epistemic uncertainty in self-supervised MVS, accounting
for what the model ignores. Specially, the limitations can be categorized into
two types: ambiguious supervision in foreground and invalid supervision in
background. To address these issues, we propose a novel Uncertainty reduction
Multi-view Stereo (UMVS) framework for self-supervised learning. To alleviate
ambiguous supervision in foreground, we involve extra correspondence prior with
a flow-depth consistency loss. The dense 2D correspondence of optical flows is
used to regularize the 3D stereo correspondence in MVS. To handle the invalid
supervision in background, we use Monte-Carlo Dropout to acquire the
uncertainty map and further filter the unreliable supervision signals on
invalid regions. Extensive experiments on DTU and Tank&Temples benchmark show
that our U-MVS framework achieves the best performance among unsupervised MVS
methods, with competitive performance with its supervised opponents.
Related papers
- MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View
Stereo [60.75684891484619]
We introduce MVSFormer++, a method that maximizes the inherent characteristics of attention to enhance various components of the MVS pipeline.
We employ different attention mechanisms for the feature encoder and cost volume regularization, focusing on feature and spatial aggregations respectively.
Comprehensive experiments on DTU, Tanks-and-Temples, BlendedMVS, and ETH3D validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2024-01-22T03:22:49Z) - Self-Supervised Bird's Eye View Motion Prediction with Cross-Modality
Signals [38.20643428486824]
Learning the dense bird's eye view (BEV) motion flow in a self-supervised manner is an emerging research for robotics and autonomous driving.
Current self-supervised methods mainly rely on point correspondences between point clouds.
We introduce a novel cross-modality self-supervised training framework that effectively addresses these issues by leveraging multi-modality data.
arXiv Detail & Related papers (2024-01-21T14:09:49Z) - Leveraging Foundation models for Unsupervised Audio-Visual Segmentation [49.94366155560371]
Audio-Visual (AVS) aims to precisely outline audible objects in a visual scene at the pixel level.
Existing AVS methods require fine-grained annotations of audio-mask pairs in supervised learning fashion.
We introduce unsupervised audio-visual segmentation with no need for task-specific data annotations and model training.
arXiv Detail & Related papers (2023-09-13T05:05:47Z) - ES-MVSNet: Efficient Framework for End-to-end Self-supervised Multi-View
Stereo [11.41432976633312]
In this work, we propose an efficient framework for end-to-end self-supervised MVS, dubbed ES-MVSNet.
To alleviate the high memory consumption of current E2E self-supervised MVS frameworks, we present a memory-efficient architecture that reduces memory usage by 43% without compromising model performance.
With the novel design of asymmetric view selection policy and region-aware depth consistency, we achieve state-of-the-art performance among E2E self-supervised MVS methods, without relying on third-party models for additional consistency signals.
arXiv Detail & Related papers (2023-08-04T08:16:47Z) - Robustness of Unsupervised Representation Learning without Labels [92.90480374344777]
We propose a family of unsupervised robustness measures, which are model- and task-agnostic and label-free.
We validate our results against a linear probe and show that, for MOCOv2, adversarial training results in 3 times higher certified accuracy.
arXiv Detail & Related papers (2022-10-08T18:03:28Z) - Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular
Depth Estimation by Integrating IMU Motion Dynamics [74.1720528573331]
Unsupervised monocular depth and ego-motion estimation has drawn extensive research attention in recent years.
We propose DynaDepth, a novel scale-aware framework that integrates information from vision and IMU motion dynamics.
We validate the effectiveness of DynaDepth by conducting extensive experiments and simulations on the KITTI and Make3D datasets.
arXiv Detail & Related papers (2022-07-11T07:50:22Z) - Unsupervised Visual Attention and Invariance for Reinforcement Learning [25.673868326662024]
We develop an independent module to disperse interference factors irrelevant to the task, thereby providing "clean" observations for the vision-based reinforcement learning policy.
All components are optimized in an unsupervised way, without manual annotation or access to environment internals.
VAI empirically shows powerful generalization capabilities and significantly outperforms current state-of-the-art (SOTA) method by 15% to 49% in DeepMind Control suite benchmark.
arXiv Detail & Related papers (2021-04-07T05:28:01Z) - What Matters in Unsupervised Optical Flow [51.45112526506455]
We compare and analyze a set of key components in unsupervised optical flow.
We construct a number of novel improvements to unsupervised flow models.
We present a new unsupervised flow technique that significantly outperforms the previous state-of-the-art.
arXiv Detail & Related papers (2020-06-08T19:36:26Z) - M^3VSNet: Unsupervised Multi-metric Multi-view Stereo Network [13.447649324253572]
We propose a novel unsupervised multi-metric MVS network, named M3VSNet, for dense point cloud reconstruction without supervision.
To improve the robustness and completeness of point cloud reconstruction, we propose a novel multi-metric loss function that combines pixel-wise and feature-wise loss function.
Experimental results show that M3VSNet establishes the state-of-the-arts unsupervised method and achieves comparable performance with previous supervised MVSNet.
arXiv Detail & Related papers (2020-04-30T09:26:51Z) - M^3VSNet: Unsupervised Multi-metric Multi-view Stereo Network [13.447649324253572]
We propose a novel unsupervised multi-metric MVS network, named M3VSNet, for dense point cloud reconstruction without supervision.
To improve the robustness and completeness of point cloud reconstruction, we propose a novel multi-metric loss function that combines pixel-wise and feature-wise loss function.
Experimental results show that M3VSNet establishes the state-of-the-arts unsupervised method and achieves comparable performance with previous supervised MVSNet.
arXiv Detail & Related papers (2020-04-21T02:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.