Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion
- URL: http://arxiv.org/abs/2401.03788v2
- Date: Wed, 17 Apr 2024 07:41:48 GMT
- Title: Low-light Image Enhancement via CLIP-Fourier Guided Wavelet Diffusion
- Authors: Minglong Xue, Jinhong He, Wenhai Wang, Mingliang Zhou,
- Abstract summary: We propose a novel and robust low-light image enhancement method via CLIP-Fourier Guided Wavelet Diffusion, abbreviated as CFWD.
CFWD leverages multimodal visual-language information in the frequency domain space created by multiple wavelet transforms to guide the enhancement process.
Our approach outperforms existing state-of-the-art methods, achieving significant progress in image quality and noise suppression.
- Score: 28.049668999586583
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Low-light image enhancement techniques have significantly progressed, but unstable image quality recovery and unsatisfactory visual perception are still significant challenges. To solve these problems, we propose a novel and robust low-light image enhancement method via CLIP-Fourier Guided Wavelet Diffusion, abbreviated as CFWD. Specifically, CFWD leverages multimodal visual-language information in the frequency domain space created by multiple wavelet transforms to guide the enhancement process. Multi-scale supervision across different modalities facilitates the alignment of image features with semantic features during the wavelet diffusion process, effectively bridging the gap between degraded and normal domains. Moreover, to further promote the effective recovery of the image details, we combine the Fourier transform based on the wavelet transform and construct a Hybrid High Frequency Perception Module (HFPM) with a significant perception of the detailed features. This module avoids the diversity confusion of the wavelet diffusion process by guiding the fine-grained structure recovery of the enhancement results to achieve favourable metric and perceptually oriented enhancement. Extensive quantitative and qualitative experiments on publicly available real-world benchmarks show that our approach outperforms existing state-of-the-art methods, achieving significant progress in image quality and noise suppression. The project code is available at https://github.com/hejh8/CFWD.
Related papers
- Zero-Shot Low-Light Image Enhancement via Joint Frequency Domain Priors Guided Diffusion [2.3874115898130865]
We will propose a new zero-shot low-light enhancement method to compensate for the lack of light and structural information in the diffusion sampling process.
The inspiration comes from the similarity between the wavelet and Fourier frequency domains.
Sufficient experiments show that the framework is robust and effective in various scenarios.
arXiv Detail & Related papers (2024-11-21T09:16:51Z) - Multi-scale Frequency Enhancement Network for Blind Image Deblurring [7.198959621445282]
We propose a multi-scale frequency enhancement network (MFENet) for blind image deblurring.
To capture the multi-scale spatial and channel information of blurred images, we introduce a multi-scale feature extraction module (MS-FE) based on depthwise separable convolutions.
We demonstrate that the proposed method achieves superior deblurring performance in both visual quality and objective evaluation metrics.
arXiv Detail & Related papers (2024-11-11T11:49:18Z) - Effective Diffusion Transformer Architecture for Image Super-Resolution [63.254644431016345]
We design an effective diffusion transformer for image super-resolution (DiT-SR)
In practice, DiT-SR leverages an overall U-shaped architecture, and adopts a uniform isotropic design for all the transformer blocks.
We analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module.
arXiv Detail & Related papers (2024-09-29T07:14:16Z) - Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement [61.22119364400268]
We propose a novel low-frequency consistency method, facilitating improved frequency disentanglement optimization.
Noteworthy improvements are showcased across five popular benchmarks, with up to 7.68dB gains on PSNR achieved for six state-of-the-art models.
Our approach maintains efficiency with only 88K extra parameters, setting a new standard in the challenging realm of low-light image enhancement.
arXiv Detail & Related papers (2024-09-03T06:19:03Z) - MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration [7.087475633143941]
MM-Diff is a tuning-free image personalization framework capable of generating high-fidelity images of both single and multiple subjects in seconds.
MM-Diff employs a vision encoder to transform the input image into CLS and patch embeddings.
CLS embeddings are used on the one hand to augment the text embeddings, and on the other hand together with patch embeddings to derive a small number of detail-rich subject embeddings.
arXiv Detail & Related papers (2024-03-22T09:32:31Z) - Misalignment-Robust Frequency Distribution Loss for Image Transformation [51.0462138717502]
This paper aims to address a common challenge in deep learning-based image transformation methods, such as image enhancement and super-resolution.
We introduce a novel and simple Frequency Distribution Loss (FDL) for computing distribution distance within the frequency domain.
Our method is empirically proven effective as a training constraint due to the thoughtful utilization of global information in the frequency domain.
arXiv Detail & Related papers (2024-02-28T09:27:41Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - Gated Multi-Resolution Transfer Network for Burst Restoration and
Enhancement [75.25451566988565]
We propose a novel Gated Multi-Resolution Transfer Network (GMTNet) to reconstruct a spatially precise high-quality image from a burst of low-quality raw images.
Detailed experimental analysis on five datasets validates our approach and sets a state-of-the-art for burst super-resolution, burst denoising, and low-light burst enhancement.
arXiv Detail & Related papers (2023-04-13T17:54:00Z) - Towards Robust Image-in-Audio Deep Steganography [14.1081872409308]
This paper extends and enhances an existing image-in-audio deep steganography method by focusing on improving its robustness.
The proposed enhancements include modifications to the loss function, utilization of the Short-Time Fourier Transform (STFT), introduction of redundancy in the encoding process for error correction, and buffering of additional information in the pixel subconvolution operation.
arXiv Detail & Related papers (2023-03-09T03:16:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.