PatchRefiner V2: Fast and Lightweight Real-Domain High-Resolution Metric Depth Estimation
- URL: http://arxiv.org/abs/2501.01121v1
- Date: Thu, 02 Jan 2025 07:41:27 GMT
- Title: PatchRefiner V2: Fast and Lightweight Real-Domain High-Resolution Metric Depth Estimation
- Authors: Zhenyu Li, Wenqing Cui, Shariq Farooq Bhat, Peter Wonka,
- Abstract summary: PRV2 outperforms state-of-the-art depth estimation methods on UnrealStereo4K in both accuracy and speed.<n>It also shows improved depth boundary delineation on real-world datasets like CityScape, ScanNet++, and KITTI.
- Score: 38.71875790942604
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While current high-resolution depth estimation methods achieve strong results, they often suffer from computational inefficiencies due to reliance on heavyweight models and multiple inference steps, increasing inference time. To address this, we introduce PatchRefiner V2 (PRV2), which replaces heavy refiner models with lightweight encoders. This reduces model size and inference time but introduces noisy features. To overcome this, we propose a Coarse-to-Fine (C2F) module with a Guided Denoising Unit for refining and denoising the refiner features and a Noisy Pretraining strategy to pretrain the refiner branch to fully exploit the potential of the lightweight refiner branch. Additionally, we introduce a Scale-and-Shift Invariant Gradient Matching (SSIGM) loss to enhance synthetic-to-real domain transfer. PRV2 outperforms state-of-the-art depth estimation methods on UnrealStereo4K in both accuracy and speed, using fewer parameters and faster inference. It also shows improved depth boundary delineation on real-world datasets like CityScape, ScanNet++, and KITTI, demonstrating its versatility across domains.
Related papers
- One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.
To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.
Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - DepthMaster: Taming Diffusion Models for Monocular Depth Estimation [41.81343543266191]
We propose a single-step diffusion model designed to adapt generative features for the discriminative depth estimation task.
We adopt a two-stage training strategy to fully leverage the potential of the two modules.
Our model achieves state-of-the-art performance in terms of generalization and detail preservation, outperforming other diffusion-based methods across various datasets.
arXiv Detail & Related papers (2025-01-05T15:18:32Z) - DiffFNO: Diffusion Fourier Neural Operator [8.895165270489167]
We introduce DiffFNO, a novel diffusion framework for arbitrary-scale super-resolution strengthened by a Weighted Fourier Neural Operator (WFNO)
We show that DiffFNO achieves state-of-the-art (SOTA) results, outperforming existing methods across various scaling factors by a margin of 2 to 4 dB in PSNR.
Our approach sets a new standard in super-resolution, delivering both superior accuracy and computational efficiency.
arXiv Detail & Related papers (2024-11-15T03:14:11Z) - FA-Depth: Toward Fast and Accurate Self-supervised Monocular Depth Estimation [11.039105169475484]
Most existing methods often rely on complex models to predict scene depth with high accuracy, resulting in slow inference that is not conducive to deployment.
We first designed SmallDepth based on sparsity.
Second, to enhance the feature representation ability of SmallDepth during training under the condition of equal complexity during inference, we propose an equivalent transformation module(ETM)
Third, to improve the ability of each layer in the case of a fixed SmallDepth to perceive different context information, we propose pyramid loss.
Fourth, to further improve the accuracy of SmallDepth, we utilized the proposed function approximation loss (APX) to
arXiv Detail & Related papers (2024-05-17T16:22:52Z) - Infrared Small Target Detection Using Double-Weighted Multi-Granularity
Patch Tensor Model With Tensor-Train Decomposition [6.517559383143804]
This paper proposes a novel double-weighted multi-granularity infrared patch tensor (DWMGIPT) model.
The proposed algorithm is robust to noise and different scenes.
arXiv Detail & Related papers (2023-10-09T02:17:31Z) - Practical Conformer: Optimizing size, speed and flops of Conformer for
on-Device and cloud ASR [67.63332492134332]
We design an optimized conformer that is small enough to meet on-device restrictions and has fast inference on TPUs.
Our proposed encoder can double as a strong standalone encoder in on device, and as the first part of a high-performance ASR pipeline.
arXiv Detail & Related papers (2023-03-31T23:30:48Z) - Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures.
This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead.
We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z) - Rethinking Lightweight Salient Object Detection via Network Depth-Width
Tradeoff [26.566339984225756]
Existing salient object detection methods often adopt deeper and wider networks for better performance.
We propose a novel trilateral decoder framework by decoupling the U-shape structure into three complementary branches.
We show that our method achieves better efficiency-accuracy balance across five benchmarks.
arXiv Detail & Related papers (2023-01-17T03:43:25Z) - Accelerated replica exchange stochastic gradient Langevin diffusion
enhanced Bayesian DeepONet for solving noisy parametric PDEs [7.337247167823921]
We propose a training framework for replica-exchange Langevin diffusion that exploits the neural network architecture of DeepONets.
We show that the proposed framework's exploration and exploitation capabilities enable improved training convergence for DeepONets in noisy scenarios.
We also show that replica-exchange Langeving Diffusion also improves the DeepONet's mean prediction accuracy in noisy scenarios.
arXiv Detail & Related papers (2021-11-03T19:23:59Z) - Layer Pruning on Demand with Intermediate CTC [50.509073206630994]
We present a training and pruning method for ASR based on the connectionist temporal classification (CTC)
We show that a Transformer-CTC model can be pruned in various depth on demand, improving real-time factor from 0.005 to 0.002 on GPU.
arXiv Detail & Related papers (2021-06-17T02:40:18Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Channel Attention based Iterative Residual Learning for Depth Map
Super-Resolution [58.626803922196146]
We argue that DSR models trained on synthetic dataset are restrictive and not effective in dealing with real-world DSR tasks.
We make two contributions in tackling real-world degradation of different depth sensors.
We propose a new framework for real-world DSR, which consists of four modules.
arXiv Detail & Related papers (2020-06-02T09:12:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.