Image-Conditional Diffusion Transformer for Underwater Image Enhancement
- URL: http://arxiv.org/abs/2407.05389v1
- Date: Sun, 7 Jul 2024 14:34:31 GMT
- Title: Image-Conditional Diffusion Transformer for Underwater Image Enhancement
- Authors: Xingyang Nie, Su Pan, Xiaoyu Zhai, Shifei Tao, Fengzhong Qu, Biao Wang, Huilin Ge, Guojie Xiao,
- Abstract summary: We propose a novel UIE method based on image-conditional diffusion transformer (ICDT)
Our method takes the degraded underwater image as the conditional input and converts it into latent space where ICDT is applied.
Our largest model, ICDT-XL/2, outperforms all comparison methods, achieving state-of-the-art (SOTA) quality of image enhancement.
- Score: 4.555168682310286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Underwater image enhancement (UIE) has attracted much attention owing to its importance for underwater operation and marine engineering. Motivated by the recent advance in generative models, we propose a novel UIE method based on image-conditional diffusion transformer (ICDT). Our method takes the degraded underwater image as the conditional input and converts it into latent space where ICDT is applied. ICDT replaces the conventional U-Net backbone in a denoising diffusion probabilistic model (DDPM) with a transformer, and thus inherits favorable properties such as scalability from transformers. Furthermore, we train ICDT with a hybrid loss function involving variances to achieve better log-likelihoods, which meanwhile significantly accelerates the sampling process. We experimentally assess the scalability of ICDTs and compare with prior works in UIE on the Underwater ImageNet dataset. Besides good scaling properties, our largest model, ICDT-XL/2, outperforms all comparison methods, achieving state-of-the-art (SOTA) quality of image enhancement.
Related papers
- Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model.
For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space.
We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z) - DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image
Enhancement [77.0360085530701]
Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments.
Previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features.
Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space.
arXiv Detail & Related papers (2023-12-12T06:07:21Z) - DiffiT: Diffusion Vision Transformers for Image Generation [88.08529836125399]
Vision Transformer (ViT) has demonstrated strong modeling capabilities and scalability, especially for recognition tasks.
We study the effectiveness of ViTs in diffusion-based generative learning and propose a new model denoted as Diffusion Vision Transformers (DiffiT)
DiffiT is surprisingly effective in generating high-fidelity images with significantly better parameter efficiency.
arXiv Detail & Related papers (2023-12-04T18:57:01Z) - An Efficient Detection and Control System for Underwater Docking using
Machine Learning and Realistic Simulation: A Comprehensive Approach [5.039813366558306]
This work compares different deep-learning architectures to perform underwater docking detection and classification.
A Generative Adversarial Network (GAN) is used to do image-to-image translation, converting the Gazebo simulation image into an underwater-looking image.
Results show an improvement of 20% in the high turbidity scenarios regardless of the underwater currents.
arXiv Detail & Related papers (2023-11-02T18:10:20Z) - DWA: Differential Wavelet Amplifier for Image Super-Resolution [4.255342416942236]
Differential Wavelet Amplifier (DWA) is a drop-in module for wavelet-based image Super-Resolution (SR)
Our proposed DWA model improves wavelet-based SR models by leveraging the difference between two convolutional filters.
We show its effectiveness by integrating it into existing SR models, e.g., DWSR and MWCNN, and demonstrate a clear improvement in classical SR tasks.
arXiv Detail & Related papers (2023-07-10T14:35:12Z) - PUGAN: Physical Model-Guided Underwater Image Enhancement Using GAN with
Dual-Discriminators [120.06891448820447]
How to obtain clear and visually pleasant images has become a common concern of people.
The task of underwater image enhancement (UIE) has also emerged as the times require.
In this paper, we propose a physical model-guided GAN model for UIE, referred to as PUGAN.
Our PUGAN outperforms state-of-the-art methods in both qualitative and quantitative metrics.
arXiv Detail & Related papers (2023-06-15T07:41:12Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - DearKD: Data-Efficient Early Knowledge Distillation for Vision
Transformers [91.6129538027725]
We propose an early knowledge distillation framework, which is termed as DearKD, to improve the data efficiency required by transformers.
Our DearKD is a two-stage framework that first distills the inductive biases from the early intermediate layers of a CNN and then gives the transformer full play by training without distillation.
arXiv Detail & Related papers (2022-04-27T15:11:04Z) - U-shape Transformer for Underwater Image Enhancement [0.0]
In this work, we constructed a large-scale underwater image dataset including 5004 image pairs.
We reported an U-shape Transformer network where the transformer model is for the first time introduced to the UIE task.
In order to further improve the contrast and saturation, a novel loss function combining RGB, LAB and LCH color spaces is designed.
arXiv Detail & Related papers (2021-11-23T13:15:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.