Enlighten-Your-Voice: When Multimodal Meets Zero-shot Low-light Image
Enhancement
- URL: http://arxiv.org/abs/2312.10109v2
- Date: Fri, 2 Feb 2024 03:08:07 GMT
- Title: Enlighten-Your-Voice: When Multimodal Meets Zero-shot Low-light Image
Enhancement
- Authors: Xiaofeng Zhang, Zishan Xu, Hao Tang, Chaochen Gu, Wei Chen, Shanying
Zhu, Xinping Guan
- Abstract summary: "Enlighten-Your-Voice" is a multimodal enhancement framework that innovatively enriches user interaction through voice and textual commands.
Our model is equipped with a Dual Collaborative Attention Module (DCAM) that meticulously caters to distinct content and color discrepancies.
"Enlighten-Your-Voice" showcases remarkable generalization in unsupervised zero-shot scenarios.
- Score: 25.073590934451055
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Low-light image enhancement is a crucial visual task, and many unsupervised
methods tend to overlook the degradation of visible information in low-light
scenes, which adversely affects the fusion of complementary information and
hinders the generation of satisfactory results. To address this, our study
introduces "Enlighten-Your-Voice", a multimodal enhancement framework that
innovatively enriches user interaction through voice and textual commands. This
approach does not merely signify a technical leap but also represents a
paradigm shift in user engagement. Our model is equipped with a Dual
Collaborative Attention Module (DCAM) that meticulously caters to distinct
content and color discrepancies, thereby facilitating nuanced enhancements.
Complementarily, we introduce a Semantic Feature Fusion (SFM) plug-and-play
module that synergizes semantic context with low-light enhancement operations,
sharpening the algorithm's efficacy. Crucially, "Enlighten-Your-Voice"
showcases remarkable generalization in unsupervised zero-shot scenarios. The
source code can be accessed from
https://github.com/zhangbaijin/Enlighten-Your-Voice
Related papers
- Semi-LLIE: Semi-supervised Contrastive Learning with Mamba-based Low-light Image Enhancement [59.17372460692809]
This work proposes a mean-teacher-based semi-supervised low-light enhancement (Semi-LLIE) framework that integrates the unpaired data into model training.
We introduce a semantic-aware contrastive loss to faithfully transfer the illumination distribution, contributing to enhancing images with natural colors.
We also propose novel perceptive loss based on the large-scale vision-language Recognize Anything Model (RAM) to help generate enhanced images with richer textual details.
arXiv Detail & Related papers (2024-09-25T04:05:32Z) - Unsupervised Image Prior via Prompt Learning and CLIP Semantic Guidance for Low-Light Image Enhancement [25.97198463881292]
We propose to improve the zero-reference low-light enhancement method by leveraging the rich visual-linguistic CLIP prior.
We show that the proposed method leads to consistent improvements across various datasets regarding task-based performance.
arXiv Detail & Related papers (2024-05-19T08:06:14Z) - CodeEnhance: A Codebook-Driven Approach for Low-Light Image Enhancement [97.95330185793358]
Low-light image enhancement (LLIE) aims to improve low-illumination images.
Existing methods face two challenges: uncertainty in restoration from diverse brightness degradations and loss of texture and color information.
We propose a novel enhancement approach, CodeEnhance, by leveraging quantized priors and image refinement.
arXiv Detail & Related papers (2024-04-08T07:34:39Z) - Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models [81.71651422951074]
Chain-of-Spot (CoS) method is a novel approach that enhances feature extraction by focusing on key regions of interest.
This technique allows LVLMs to access more detailed visual information without altering the original image resolution.
Our empirical findings demonstrate a significant improvement in LVLMs' ability to understand and reason about visual content.
arXiv Detail & Related papers (2024-03-19T17:59:52Z) - A Non-Uniform Low-Light Image Enhancement Method with Multi-Scale
Attention Transformer and Luminance Consistency Loss [11.585269110131659]
Low-light image enhancement aims to improve the perception of images collected in dim environments.
Existing methods cannot adaptively extract the differentiated luminance information, which will easily cause over-exposure and under-exposure.
We propose a multi-scale attention Transformer named MSATr, which sufficiently extracts local and global features for light balance to improve the visual quality.
arXiv Detail & Related papers (2023-12-27T10:07:11Z) - Enhancement by Your Aesthetic: An Intelligible Unsupervised Personalized
Enhancer for Low-Light Images [67.14410374622699]
We propose an intelligible unsupervised personalized enhancer (iUPEnhancer) for low-light images.
The proposed iUP-Enhancer is trained with the guidance of these correlations and the corresponding unsupervised loss functions.
Experiments demonstrate that the proposed algorithm produces competitive qualitative and quantitative results.
arXiv Detail & Related papers (2022-07-15T07:16:10Z) - Distilling Audio-Visual Knowledge by Compositional Contrastive Learning [51.20935362463473]
We learn a compositional embedding that closes the cross-modal semantic gap.
We establish a new, comprehensive multi-modal distillation benchmark on three video datasets.
arXiv Detail & Related papers (2021-04-22T09:31:20Z) - Unsupervised Low-light Image Enhancement with Decoupled Networks [103.74355338972123]
We learn a two-stage GAN-based framework to enhance the real-world low-light images in a fully unsupervised fashion.
Our proposed method outperforms the state-of-the-art unsupervised image enhancement methods in terms of both illumination enhancement and noise reduction.
arXiv Detail & Related papers (2020-05-06T13:37:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.