Single UHD Image Dehazing via Interpretable Pyramid Network
- URL: http://arxiv.org/abs/2202.08589v1
- Date: Thu, 17 Feb 2022 11:14:12 GMT
- Title: Single UHD Image Dehazing via Interpretable Pyramid Network
- Authors: Boxue Xiao, Zhuoran Zheng, Xiang Chen, Chen Lv, Yunliang Zhuang, Tao
Wang
- Abstract summary: Currently, most single image dehazing models cannot run an ultra-high-resolution (UHD) image with a single GPU in real-time.
We introduce the principle of infinite approximation of Taylor's theorem with the Laplace pyramid pattern to build a model which is capable of handling 4K images in real-time.
- Score: 10.00144096602321
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Currently, most single image dehazing models cannot run an
ultra-high-resolution (UHD) image with a single GPU shader in real-time. To
address the problem, we introduce the principle of infinite approximation of
Taylor's theorem with the Laplace pyramid pattern to build a model which is
capable of handling 4K hazy images in real-time. The N branch networks of the
pyramid network correspond to the N constraint terms in Taylor's theorem.
Low-order polynomials reconstruct the low-frequency information of the image
(e.g. color, illumination). High-order polynomials regress the high-frequency
information of the image (e.g. texture). In addition, we propose a Tucker
reconstruction-based regularization term that acts on each branch network of
the pyramid model. It further constrains the generation of anomalous signals in
the feature space. Extensive experimental results demonstrate that our approach
can not only run 4K images with haze in real-time on a single GPU (80FPS) but
also has unparalleled interpretability.
The developed method achieves state-of-the-art (SOTA) performance on two
benchmarks (O/I-HAZE) and our updated 4KID dataset while providing the reliable
groundwork for subsequent optimization schemes.
Related papers
- MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - Splatter Image: Ultra-Fast Single-View 3D Reconstruction [67.96212093828179]
Splatter Image is based on Gaussian Splatting, which allows fast and high-quality reconstruction of 3D scenes from multiple images.
We learn a neural network that, at test time, performs reconstruction in a feed-forward manner, at 38 FPS.
On several synthetic, real, multi-category and large-scale benchmark datasets, we achieve better results in terms of PSNR, LPIPS, and other metrics while training and evaluating much faster than prior works.
arXiv Detail & Related papers (2023-12-20T16:14:58Z) - 4K-Resolution Photo Exposure Correction at 125 FPS with ~8K Parameters [9.410502389242815]
In this paper, we propose extremely light-weight (with only 8K parameters) Multi-Scale Linear Transformation (MSLT) networks.
MSLT networks can process 4K-resolution sRGB images at 125 Frame-Per-Second (FPS) by a Titan GTX GPU.
Experiments on two benchmark datasets demonstrate the efficiency of our MSLTs against the state-of-the-arts on photo exposure correction.
arXiv Detail & Related papers (2023-11-15T08:01:12Z) - 4K4D: Real-Time 4D View Synthesis at 4K Resolution [86.6582179227016]
This paper targets high-fidelity and real-time view of dynamic 3D scenes at 4K resolution.
We propose a 4D point cloud representation that supports hardwareization and enables unprecedented rendering speed.
Our representation can be rendered at over 400 FPS on the DNA-Rendering dataset at 1080p resolution and 80 FPS on the ENeRF-Outdoor dataset at 4K resolution using an 4090 GPU.
arXiv Detail & Related papers (2023-10-17T17:57:38Z) - HQ3DAvatar: High Quality Controllable 3D Head Avatar [65.70885416855782]
This paper presents a novel approach to building highly photorealistic digital head avatars.
Our method learns a canonical space via an implicit function parameterized by a neural network.
At test time, our method is driven by a monocular RGB video.
arXiv Detail & Related papers (2023-03-25T13:56:33Z) - Perceptually Optimized Deep High-Dynamic-Range Image Tone Mapping [44.00069411131762]
We first decompose an HDR image into a normalized Laplacian pyramid, and use two deep neural networks (DNNs) to estimate the Laplacian pyramid of the desired tone-mapped image from the normalized representation.
We then end-to-end optimize the entire method over a database of HDR images by minimizing the normalized Laplacian pyramid distance.
arXiv Detail & Related papers (2021-09-01T04:17:31Z) - Cascading Modular Network (CAM-Net) for Multimodal Image Synthesis [7.726465518306907]
A persistent challenge has been to generate diverse versions of output images from the same input image.
We propose CAM-Net, a unified architecture that can be applied to a broad range of tasks.
It is capable of generating convincing high frequency details, achieving a reduction of the Frechet Inception Distance (FID) by up to 45.3% compared to the baseline.
arXiv Detail & Related papers (2021-06-16T17:58:13Z) - High-Resolution Photorealistic Image Translation in Real-Time: A
Laplacian Pyramid Translation Network [23.981019687483506]
We focus on speeding-up the high-resolution photorealistic I2IT tasks based on closed-form Laplacian pyramid decomposition and reconstruction.
We propose a Laplacian Pyramid Translation Network (N) to simultaneously perform these two tasks.
Our model avoids most of the heavy computation consumed by processing high-resolution feature maps and faithfully preserves the image details.
arXiv Detail & Related papers (2021-05-19T15:05:22Z) - Adversarial Generation of Continuous Images [31.92891885615843]
In this paper, we propose two novel architectural techniques for building INR-based image decoders.
We use them to build a state-of-the-art continuous image GAN.
Our proposed INR-GAN architecture improves the performance of continuous image generators by several times.
arXiv Detail & Related papers (2020-11-24T11:06:40Z) - Learning Deformable Tetrahedral Meshes for 3D Reconstruction [78.0514377738632]
3D shape representations that accommodate learning-based 3D reconstruction are an open problem in machine learning and computer graphics.
Previous work on neural 3D reconstruction demonstrated benefits, but also limitations, of point cloud, voxel, surface mesh, and implicit function representations.
We introduce Deformable Tetrahedral Meshes (DefTet) as a particular parameterization that utilizes volumetric tetrahedral meshes for the reconstruction problem.
arXiv Detail & Related papers (2020-11-03T02:57:01Z) - Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.