PatchRefiner V2: Fast and Lightweight Real-Domain High-Resolution Metric Depth Estimation
- URL: http://arxiv.org/abs/2501.01121v1
- Date: Thu, 02 Jan 2025 07:41:27 GMT
- Title: PatchRefiner V2: Fast and Lightweight Real-Domain High-Resolution Metric Depth Estimation
- Authors: Zhenyu Li, Wenqing Cui, Shariq Farooq Bhat, Peter Wonka,
- Abstract summary: PRV2 outperforms state-of-the-art depth estimation methods on UnrealStereo4K in both accuracy and speed.
It also shows improved depth boundary delineation on real-world datasets like CityScape, ScanNet++, and KITTI.
- Score: 38.71875790942604
- License:
- Abstract: While current high-resolution depth estimation methods achieve strong results, they often suffer from computational inefficiencies due to reliance on heavyweight models and multiple inference steps, increasing inference time. To address this, we introduce PatchRefiner V2 (PRV2), which replaces heavy refiner models with lightweight encoders. This reduces model size and inference time but introduces noisy features. To overcome this, we propose a Coarse-to-Fine (C2F) module with a Guided Denoising Unit for refining and denoising the refiner features and a Noisy Pretraining strategy to pretrain the refiner branch to fully exploit the potential of the lightweight refiner branch. Additionally, we introduce a Scale-and-Shift Invariant Gradient Matching (SSIGM) loss to enhance synthetic-to-real domain transfer. PRV2 outperforms state-of-the-art depth estimation methods on UnrealStereo4K in both accuracy and speed, using fewer parameters and faster inference. It also shows improved depth boundary delineation on real-world datasets like CityScape, ScanNet++, and KITTI, demonstrating its versatility across domains.
Related papers
- Learning Inverse Laplacian Pyramid for Progressive Depth Completion [18.977393635158048]
LP-Net is an innovative framework that implements a multi-scale, progressive prediction paradigm based on Laplacian Pyramid decomposition.
At the time of submission, LP-Net ranks 1st among all peer-reviewed methods on the official KITTI leaderboard.
arXiv Detail & Related papers (2025-02-11T06:21:42Z) - DepthMaster: Taming Diffusion Models for Monocular Depth Estimation [41.81343543266191]
We propose a single-step diffusion model designed to adapt generative features for the discriminative depth estimation task.
We adopt a two-stage training strategy to fully leverage the potential of the two modules.
Our model achieves state-of-the-art performance in terms of generalization and detail preservation, outperforming other diffusion-based methods across various datasets.
arXiv Detail & Related papers (2025-01-05T15:18:32Z) - DiffFNO: Diffusion Fourier Neural Operator [8.895165270489167]
We introduce DiffFNO, a novel diffusion framework for arbitrary-scale super-resolution strengthened by a Weighted Fourier Neural Operator (WFNO)
We show that DiffFNO achieves state-of-the-art (SOTA) results, outperforming existing methods across various scaling factors by a margin of 2 to 4 dB in PSNR.
Our approach sets a new standard in super-resolution, delivering both superior accuracy and computational efficiency.
arXiv Detail & Related papers (2024-11-15T03:14:11Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Practical Conformer: Optimizing size, speed and flops of Conformer for
on-Device and cloud ASR [67.63332492134332]
We design an optimized conformer that is small enough to meet on-device restrictions and has fast inference on TPUs.
Our proposed encoder can double as a strong standalone encoder in on device, and as the first part of a high-performance ASR pipeline.
arXiv Detail & Related papers (2023-03-31T23:30:48Z) - Rethinking Lightweight Salient Object Detection via Network Depth-Width
Tradeoff [26.566339984225756]
Existing salient object detection methods often adopt deeper and wider networks for better performance.
We propose a novel trilateral decoder framework by decoupling the U-shape structure into three complementary branches.
We show that our method achieves better efficiency-accuracy balance across five benchmarks.
arXiv Detail & Related papers (2023-01-17T03:43:25Z) - Accelerated replica exchange stochastic gradient Langevin diffusion
enhanced Bayesian DeepONet for solving noisy parametric PDEs [7.337247167823921]
We propose a training framework for replica-exchange Langevin diffusion that exploits the neural network architecture of DeepONets.
We show that the proposed framework's exploration and exploitation capabilities enable improved training convergence for DeepONets in noisy scenarios.
We also show that replica-exchange Langeving Diffusion also improves the DeepONet's mean prediction accuracy in noisy scenarios.
arXiv Detail & Related papers (2021-11-03T19:23:59Z) - Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC)
We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Channel Attention based Iterative Residual Learning for Depth Map
Super-Resolution [58.626803922196146]
We argue that DSR models trained on synthetic dataset are restrictive and not effective in dealing with real-world DSR tasks.
We make two contributions in tackling real-world degradation of different depth sensors.
We propose a new framework for real-world DSR, which consists of four modules.
arXiv Detail & Related papers (2020-06-02T09:12:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.