On the Sins of Image Synthesis Loss for Self-supervised Depth Estimation
- URL: http://arxiv.org/abs/2109.06163v1
- Date: Mon, 13 Sep 2021 17:57:24 GMT
- Title: On the Sins of Image Synthesis Loss for Self-supervised Depth Estimation
- Authors: Zhaoshuo Li, Nathan Drenkow, Hao Ding, Andy S. Ding, Alexander Lu,
Francis X. Creighton, Russell H. Taylor, Mathias Unberath
- Abstract summary: We show that improvements in image synthesis do not necessitate improvement in depth estimation.
We attribute this diverging phenomenon to aleatoric uncertainties, which originate from data.
This observed divergence has not been previously reported or studied in depth.
- Score: 60.780823530087446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene depth estimation from stereo and monocular imagery is critical for
extracting 3D information for downstream tasks such as scene understanding.
Recently, learning-based methods for depth estimation have received much
attention due to their high performance and flexibility in hardware choice.
However, collecting ground truth data for supervised training of these
algorithms is costly or outright impossible. This circumstance suggests a need
for alternative learning approaches that do not require corresponding depth
measurements. Indeed, self-supervised learning of depth estimation provides an
increasingly popular alternative. It is based on the idea that observed frames
can be synthesized from neighboring frames if accurate depth of the scene is
known - or in this case, estimated. We show empirically that - contrary to
common belief - improvements in image synthesis do not necessitate improvement
in depth estimation. Rather, optimizing for image synthesis can result in
diverging performance with respect to the main prediction objective - depth. We
attribute this diverging phenomenon to aleatoric uncertainties, which originate
from data. Based on our experiments on four datasets (spanning street, indoor,
and medical) and five architectures (monocular and stereo), we conclude that
this diverging phenomenon is independent of the dataset domain and not
mitigated by commonly used regularization techniques. To underscore the
importance of this finding, we include a survey of methods which use image
synthesis, totaling 127 papers over the last six years. This observed
divergence has not been previously reported or studied in depth, suggesting
room for future improvement of self-supervised approaches which might be
impacted the finding.
Related papers
- Robust Geometry-Preserving Depth Estimation Using Differentiable
Rendering [93.94371335579321]
We propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations.
Comprehensive experiments underscore our framework's superior generalization capabilities.
Our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients.
arXiv Detail & Related papers (2023-09-18T12:36:39Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Depth Refinement for Improved Stereo Reconstruction [13.941756438712382]
Current techniques for depth estimation from stereoscopic images still suffer from a built-in drawback.
A simple analysis reveals that the depth error is quadratically proportional to the object's distance.
We propose a simple but effective method that uses a refinement network for depth estimation.
arXiv Detail & Related papers (2021-12-15T12:21:08Z) - Unsupervised Monocular Depth Perception: Focusing on Moving Objects [5.489557739480878]
In this paper, we show that deliberately manipulating photometric errors can efficiently deal with difficulties better.
We first propose an outlier masking technique that considers the occluded or dynamic pixels as statistical outliers in the photometric error map.
With the outlier masking, the network learns the depth of objects that move in the opposite direction to the camera more accurately.
arXiv Detail & Related papers (2021-08-30T08:45:02Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - Occlusion-aware Unsupervised Learning of Depth from 4-D Light Fields [50.435129905215284]
We present an unsupervised learning-based depth estimation method for 4-D light field processing and analysis.
Based on the basic knowledge of the unique geometry structure of light field data, we explore the angular coherence among subsets of the light field views to estimate depth maps.
Our method can significantly shrink the performance gap between the previous unsupervised method and supervised ones, and produce depth maps with comparable accuracy to traditional methods with obviously reduced computational cost.
arXiv Detail & Related papers (2021-06-06T06:19:50Z) - Self-Guided Instance-Aware Network for Depth Completion and Enhancement [6.319531161477912]
Existing methods directly interpolate the missing depth measurements based on pixel-wise image content and the corresponding neighboring depth values.
We propose a novel self-guided instance-aware network (SG-IANet) that utilize self-guided mechanism to extract instance-level features that is needed for depth restoration.
arXiv Detail & Related papers (2021-05-25T19:41:38Z) - Fast Depth Estimation for View Synthesis [9.243157709083672]
Disparity/depth estimation from sequences of stereo images is an important element in 3D vision.
We propose a novel learning-based framework making use of dilated convolution, densely connected convolutional modules, compact decoder and skip connections.
We show that our network outperforms state-of-the-art methods with an average improvement in depth estimation and view synthesis by approximately 45% and 34% respectively.
arXiv Detail & Related papers (2020-03-14T14:10:42Z) - Don't Forget The Past: Recurrent Depth Estimation from Monocular Video [92.84498980104424]
We put three different types of depth estimation into a common framework.
Our method produces a time series of depth maps.
It can be applied to monocular videos only or be combined with different types of sparse depth patterns.
arXiv Detail & Related papers (2020-01-08T16:50:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.