Iterative Prompt Learning for Unsupervised Backlit Image Enhancement
- URL: http://arxiv.org/abs/2303.17569v2
- Date: Fri, 29 Sep 2023 13:40:57 GMT
- Title: Iterative Prompt Learning for Unsupervised Backlit Image Enhancement
- Authors: Zhexin Liang, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Chen Change
Loy
- Abstract summary: We propose a novel unsupervised backlit image enhancement method, abbreviated as CLIP-LIT.
We show that the open-world CLIP prior aids in distinguishing between backlit and well-lit images.
Our method alternates between updating the prompt learning framework and enhancement network until visually pleasing results are achieved.
- Score: 86.90993077000789
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a novel unsupervised backlit image enhancement method, abbreviated
as CLIP-LIT, by exploring the potential of Contrastive Language-Image
Pre-Training (CLIP) for pixel-level image enhancement. We show that the
open-world CLIP prior not only aids in distinguishing between backlit and
well-lit images, but also in perceiving heterogeneous regions with different
luminance, facilitating the optimization of the enhancement network. Unlike
high-level and image manipulation tasks, directly applying CLIP to enhancement
tasks is non-trivial, owing to the difficulty in finding accurate prompts. To
solve this issue, we devise a prompt learning framework that first learns an
initial prompt pair by constraining the text-image similarity between the
prompt (negative/positive sample) and the corresponding image (backlit
image/well-lit image) in the CLIP latent space. Then, we train the enhancement
network based on the text-image similarity between the enhanced result and the
initial prompt pair. To further improve the accuracy of the initial prompt
pair, we iteratively fine-tune the prompt learning framework to reduce the
distribution gaps between the backlit images, enhanced results, and well-lit
images via rank learning, boosting the enhancement performance. Our method
alternates between updating the prompt learning framework and enhancement
network until visually pleasing results are achieved. Extensive experiments
demonstrate that our method outperforms state-of-the-art methods in terms of
visual quality and generalization ability, without requiring any paired data.
Related papers
- Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP [22.33658954569737]
We build a mutual guidance mechanism, that introduces an Image-Guided-Text (IGT) component and a Text-Guided-Image (TGI) component.
Extensive experiments show that TIMO significantly outperforms the state-of-the-art (SOTA) training-free method.
We propose an enhanced variant, TIMO-S, which even surpasses the best training-required methods by 0.33% with approximately 100 times less time cost.
arXiv Detail & Related papers (2024-12-16T02:03:45Z) - Leveraging Content and Context Cues for Low-Light Image Enhancement [25.97198463881292]
Low-light conditions have an adverse impact on machine cognition, limiting the performance of computer vision systems in real life.
We propose to improve the existing zero-reference low-light enhancement by leveraging the CLIP model to capture image prior and for semantic guidance.
We experimentally show, that the proposed prior and semantic guidance help to improve the overall image contrast and hue, as well as improve background-foreground discrimination.
arXiv Detail & Related papers (2024-12-10T17:32:09Z) - Ranking-aware adapter for text-driven image ordering with CLIP [76.80965830448781]
We propose an effective yet efficient approach that reframes the CLIP model into a learning-to-rank task.
Our approach incorporates learnable prompts to adapt to new instructions for ranking purposes.
Our ranking-aware adapter consistently outperforms fine-tuned CLIPs on various tasks.
arXiv Detail & Related papers (2024-12-09T18:51:05Z) - RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement [0.24578723416255752]
We propose a novel modification of Contrastive Language-Image Pre-Training (CLIP) guidance for the task of unsupervised backlit image enhancement.
Our work builds on the state-of-the-art CLIP-LIT approach, which learns a prompt pair by constraining the text-image similarity between a prompt (negative/positive sample) and a corresponding image (backlit image/well-lit image) in the CLIP embedding space.
We show that instead of tuning prompts in the space of text embeddings, it is possible to directly tune their embeddings in the latent space without any loss in quality
arXiv Detail & Related papers (2024-04-02T12:28:40Z) - CLIP Guided Image-perceptive Prompt Learning for Image Enhancement [15.40368082025006]
Contrastive Language-Image Pre-Training (CLIP) Guided Prompt Learning is proposed.
We learn image-perceptive prompts to distinguish between original and target images using CLIP model.
We introduce a very simple network by incorporating a simple baseline to predict the weights of three different LUT as enhancement network.
arXiv Detail & Related papers (2023-11-07T12:36:20Z) - Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models [52.3032592038514]
We propose a class-aware text prompt to enrich generated prompts with label-related image information.
We achieve an average improvement of 4.03% on new classes and 3.19% on harmonic-mean over eleven classification benchmarks.
arXiv Detail & Related papers (2023-03-30T06:02:40Z) - Texts as Images in Prompt Tuning for Multi-Label Image Recognition [70.9310322461598]
We advocate that image-text contrastive learning makes it feasible to treat texts as images for prompt tuning and introduce TaI prompting.
Particularly, we apply TaI prompting to multi-label image recognition, where sentences in the wild serve as alternatives to images for prompt tuning.
Our proposed TaI-DPT outperforms zero-shot CLIP by a large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-11-23T07:00:11Z) - Non-Contrastive Learning Meets Language-Image Pre-Training [145.6671909437841]
We study the validity of non-contrastive language-image pre-training (nCLIP)
We introduce xCLIP, a multi-tasking framework combining CLIP and nCLIP, and show that nCLIP aids CLIP in enhancing feature semantics.
arXiv Detail & Related papers (2022-10-17T17:57:46Z) - Prompt-based Learning for Unpaired Image Captioning [86.44188293709307]
Unpaired Image Captioning (UIC) has been developed to learn image descriptions from unaligned vision-language sample pairs.
Recent successes of Vision-Language Pre-Trained Models (VL-PTMs) have triggered the development of prompt-based learning.
We present in this paper a novel scheme based on prompt to train the UIC model, making best use of the powerful generalization ability.
arXiv Detail & Related papers (2022-05-26T03:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.