Single Image Depth Estimation using Wavelet Decomposition
- URL: http://arxiv.org/abs/2106.02022v1
- Date: Thu, 3 Jun 2021 17:42:25 GMT
- Title: Single Image Depth Estimation using Wavelet Decomposition
- Authors: Micha\"el Ramamonjisoa and Michael Firman and Jamie Watson and Vincent
Lepetit and Daniyar Turmukhambetov
- Abstract summary: We present a novel method for predicting accurate depths from monocular images with high efficiency.
This optimal efficiency is achieved by exploiting wavelet decomposition.
We demonstrate that we can reconstruct high-fidelity depth maps by predicting sparse wavelet coefficients.
- Score: 37.486778463181
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel method for predicting accurate depths from monocular
images with high efficiency. This optimal efficiency is achieved by exploiting
wavelet decomposition, which is integrated in a fully differentiable
encoder-decoder architecture. We demonstrate that we can reconstruct
high-fidelity depth maps by predicting sparse wavelet coefficients. In contrast
with previous works, we show that wavelet coefficients can be learned without
direct supervision on coefficients. Instead we supervise only the final depth
image that is reconstructed through the inverse wavelet transform. We
additionally show that wavelet coefficients can be learned in fully
self-supervised scenarios, without access to ground-truth depth. Finally, we
apply our method to different state-of-the-art monocular depth estimation
models, in each case giving similar or better results compared to the original
model, while requiring less than half the multiply-adds in the decoder network.
Code at https://github.com/nianticlabs/wavelet-monodepth
Related papers
- DepthSplat: Connecting Gaussian Splatting and Depth [90.06180236292866]
We present DepthSplat to connect Gaussian splatting and depth estimation.
We first contribute a robust multi-view depth model by leveraging pre-trained monocular depth features.
We also show that Gaussian splatting can serve as an unsupervised pre-training objective.
arXiv Detail & Related papers (2024-10-17T17:59:58Z) - Harnessing Wavelet Transformations for Generalizable Deepfake Forgery Detection [0.0]
Wavelet-CLIP is a deepfake detection framework that integrates wavelet transforms with features derived from the ViT-L/14 architecture, pre-trained in the CLIP fashion.
Our method showcases outstanding performance, achieving an average AUC of 0.749 for cross-data generalization and 0.893 for robustness against unseen deepfakes.
arXiv Detail & Related papers (2024-09-26T21:16:51Z) - PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion Preimage [19.02295657801464]
This work addresses the task of zero-shot monocular depth estimation.
A recent advance in this field has been the idea of utilising Text-to-Image foundation models, such as Stable Diffusion.
We present PrimeDepth, a method that is highly efficient at test time while keeping, or even enhancing, the positive aspects of diffusion-based approaches.
arXiv Detail & Related papers (2024-09-13T19:03:48Z) - WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration [68.25711405944239]
Deep image registration has demonstrated exceptional accuracy and fast inference.
Recent advances have adopted either multiple cascades or pyramid architectures to estimate dense deformation fields in a coarse-to-fine manner.
We introduce a model-driven WiNet that incrementally estimates scale-wise wavelet coefficients for the displacement/velocity field across various scales.
arXiv Detail & Related papers (2024-07-18T11:51:01Z) - Metrically Scaled Monocular Depth Estimation through Sparse Priors for
Underwater Robots [0.0]
We formulate a deep learning model that fuses sparse depth measurements from triangulated features to improve the depth predictions.
The network is trained in a supervised fashion on the forward-looking underwater dataset, FLSea.
The method achieves real-time performance, running at 160 FPS on a laptop GPU and 7 FPS on a single CPU core.
arXiv Detail & Related papers (2023-10-25T16:32:31Z) - AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation [51.143540967290114]
We propose a method that unlocks a wide range of previously-infeasible geometric augmentations for unsupervised depth computation and estimation.
This is achieved by reversing, or undo''-ing, geometric transformations to the coordinates of the output depth, warping the depth map back to the original reference frame.
arXiv Detail & Related papers (2023-10-15T05:15:45Z) - Dynamic Frame Interpolation in Wavelet Domain [57.25341639095404]
Video frame is an important low-level computation vision task, which can increase frame rate for more fluent visual experience.
Existing methods have achieved great success by employing advanced motion models and synthesis networks.
WaveletVFI can reduce computation up to 40% while maintaining similar accuracy, making it perform more efficiently against other state-of-the-arts.
arXiv Detail & Related papers (2023-09-07T06:41:15Z) - Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - FaDIV-Syn: Fast Depth-Independent View Synthesis [27.468361999226886]
We introduce FaDIV-Syn, a fast depth-independent view synthesis method.
Our multi-view approach addresses the problem that view synthesis methods are often limited by their depth estimation stage.
arXiv Detail & Related papers (2021-06-24T16:14:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.