Related papers: Image-Conditional Diffusion Transformer for Underwater Image Enhancement

Image-Conditional Diffusion Transformer for Underwater Image Enhancement

URL: http://arxiv.org/abs/2407.05389v1
Date: Sun, 7 Jul 2024 14:34:31 GMT
Title: Image-Conditional Diffusion Transformer for Underwater Image Enhancement
Authors: Xingyang Nie, Su Pan, Xiaoyu Zhai, Shifei Tao, Fengzhong Qu, Biao Wang, Huilin Ge, Guojie Xiao,
Abstract summary: We propose a novel UIE method based on image-conditional diffusion transformer (ICDT) Our method takes the degraded underwater image as the conditional input and converts it into latent space where ICDT is applied. Our largest model, ICDT-XL/2, outperforms all comparison methods, achieving state-of-the-art (SOTA) quality of image enhancement.
Score: 4.555168682310286
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Underwater image enhancement (UIE) has attracted much attention owing to its importance for underwater operation and marine engineering. Motivated by the recent advance in generative models, we propose a novel UIE method based on image-conditional diffusion transformer (ICDT). Our method takes the degraded underwater image as the conditional input and converts it into latent space where ICDT is applied. ICDT replaces the conventional U-Net backbone in a denoising diffusion probabilistic model (DDPM) with a transformer, and thus inherits favorable properties such as scalability from transformers. Furthermore, we train ICDT with a hybrid loss function involving variances to achieve better log-likelihoods, which meanwhile significantly accelerates the sampling process. We experimentally assess the scalability of ICDTs and compare with prior works in UIE on the Underwater ImageNet dataset. Besides good scaling properties, our largest model, ICDT-XL/2, outperforms all comparison methods, achieving state-of-the-art (SOTA) quality of image enhancement.

Related papers

TDiR: Transformer based Diffusion for Image Restoration Tasks [19.992144590243836]
Images captured in challenging environments often experience various forms of degradation, including noise, color cast, blur, and light scattering.<n>These effects significantly reduce image quality, hindering their applicability in downstream tasks such as object detection, mapping, and classification.<n>Our transformer-based diffusion model was developed to address image restoration tasks, aiming to improve the quality of degraded images.
arXiv Detail & Related papers (2025-06-25T10:28:13Z)
NAMI: Efficient Image Generation via Progressive Rectified Flow Transformers [11.087309945227826]
Flow-based transformer models for image generation have achieved state-of-the-art performance with larger model parameters, but their inference deployment cost remains high. To enhance inference performance while maintaining generation quality, we propose progressive rectified flow transformers. Experiments demonstrate that our approach achieves fast convergence and reduces inference time while ensuring generation quality.
arXiv Detail & Related papers (2025-03-12T10:38:58Z)
Effective Diffusion Transformer Architecture for Image Super-Resolution [63.254644431016345]
We design an effective diffusion transformer for image super-resolution (DiT-SR) In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks. We analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module.
arXiv Detail & Related papers (2024-09-29T07:14:16Z)
UIE-UnFold: Deep Unfolding Network with Color Priors and Vision Transformer for Underwater Image Enhancement [27.535028176427623]
Underwater image enhancement (UIE) plays a crucial role in various marine applications. Current learning-based approaches frequently lack explicit prior knowledge about the physical processes involved in underwater image formation. This paper proposes a novel deep unfolding network (DUN) for UIE that integrates color priors and inter-stage feature incorporation.
arXiv Detail & Related papers (2024-08-20T08:48:33Z)
Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model. For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space. We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z)
DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image Enhancement [77.0360085530701]
Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments. Previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features. Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space.
arXiv Detail & Related papers (2023-12-12T06:07:21Z)
DiffiT: Diffusion Vision Transformers for Image Generation [88.08529836125399]
Vision Transformer (ViT) has demonstrated strong modeling capabilities and scalability, especially for recognition tasks. We study the effectiveness of ViTs in diffusion-based generative learning and propose a new model denoted as Diffusion Vision Transformers (DiffiT) DiffiT is surprisingly effective in generating high-fidelity images with significantly better parameter efficiency.
arXiv Detail & Related papers (2023-12-04T18:57:01Z)
An Efficient Detection and Control System for Underwater Docking using Machine Learning and Realistic Simulation: A Comprehensive Approach [5.039813366558306]
This work compares different deep-learning architectures to perform underwater docking detection and classification. A Generative Adversarial Network (GAN) is used to do image-to-image translation, converting the Gazebo simulation image into an underwater-looking image. Results show an improvement of 20% in the high turbidity scenarios regardless of the underwater currents.
arXiv Detail & Related papers (2023-11-02T18:10:20Z)
Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration. We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z)
DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers [91.6129538027725]
We propose an early knowledge distillation framework, which is termed as DearKD, to improve the data efficiency required by transformers. Our DearKD is a two-stage framework that first distills the inductive biases from the early intermediate layers of a CNN and then gives the transformer full play by training without distillation.
arXiv Detail & Related papers (2022-04-27T15:11:04Z)
U-shape Transformer for Underwater Image Enhancement [0.0]
In this work, we constructed a large-scale underwater image dataset including 5004 image pairs. We reported an U-shape Transformer network where the transformer model is for the first time introduced to the UIE task. In order to further improve the contrast and saturation, a novel loss function combining RGB, LAB and LCH color spaces is designed.
arXiv Detail & Related papers (2021-11-23T13:15:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.