Related papers: Iterative Prompt Learning for Unsupervised Backlit Image Enhancement

Iterative Prompt Learning for Unsupervised Backlit Image Enhancement

URL: http://arxiv.org/abs/2303.17569v2
Date: Fri, 29 Sep 2023 13:40:57 GMT
Title: Iterative Prompt Learning for Unsupervised Backlit Image Enhancement
Authors: Zhexin Liang, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Chen Change Loy
Abstract summary: We propose a novel unsupervised backlit image enhancement method, abbreviated as CLIP-LIT. We show that the open-world CLIP prior aids in distinguishing between backlit and well-lit images. Our method alternates between updating the prompt learning framework and enhancement network until visually pleasing results are achieved.
Score: 86.90993077000789
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a novel unsupervised backlit image enhancement method, abbreviated as CLIP-LIT, by exploring the potential of Contrastive Language-Image Pre-Training (CLIP) for pixel-level image enhancement. We show that the open-world CLIP prior not only aids in distinguishing between backlit and well-lit images, but also in perceiving heterogeneous regions with different luminance, facilitating the optimization of the enhancement network. Unlike high-level and image manipulation tasks, directly applying CLIP to enhancement tasks is non-trivial, owing to the difficulty in finding accurate prompts. To solve this issue, we devise a prompt learning framework that first learns an initial prompt pair by constraining the text-image similarity between the prompt (negative/positive sample) and the corresponding image (backlit image/well-lit image) in the CLIP latent space. Then, we train the enhancement network based on the text-image similarity between the enhanced result and the initial prompt pair. To further improve the accuracy of the initial prompt pair, we iteratively fine-tune the prompt learning framework to reduce the distribution gaps between the backlit images, enhanced results, and well-lit images via rank learning, boosting the enhancement performance. Our method alternates between updating the prompt learning framework and enhancement network until visually pleasing results are achieved. Extensive experiments demonstrate that our method outperforms state-of-the-art methods in terms of visual quality and generalization ability, without requiring any paired data.

Related papers

CURVE: CLIP-Utilized Reinforcement Learning for Visual Image Enhancement via Simple Image Processing [0.5803309695504829]
Low-Light Image Enhancement (LLIE) is crucial for improving both human perception and computer vision tasks.<n>This paper addresses two challenges in zero-reference LLIE: obtaining perceptually 'good' images and maintaining computational efficiency for high-resolution images.<n>We propose CLIP-Utilized Reinforcement learning-based Visual image Enhancement (CURVE)
arXiv Detail & Related papers (2025-05-29T05:09:13Z)
Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP [22.33658954569737]
We build a mutual guidance mechanism, that introduces an Image-Guided-Text (IGT) component and a Text-Guided-Image (TGI) component. Extensive experiments show that TIMO significantly outperforms the state-of-the-art (SOTA) training-free method. We propose an enhanced variant, TIMO-S, which even surpasses the best training-required methods by 0.33% with approximately 100 times less time cost.
arXiv Detail & Related papers (2024-12-16T02:03:45Z)
Leveraging Content and Context Cues for Low-Light Image Enhancement [25.97198463881292]
Low-light conditions have an adverse impact on machine cognition, limiting the performance of computer vision systems in real life. We propose to improve the existing zero-reference low-light enhancement by leveraging the CLIP model to capture image prior and for semantic guidance. We experimentally show, that the proposed prior and semantic guidance help to improve the overall image contrast and hue, as well as improve background-foreground discrimination.
arXiv Detail & Related papers (2024-12-10T17:32:09Z)
Ranking-aware adapter for text-driven image ordering with CLIP [76.80965830448781]
We propose an effective yet efficient approach that reframes the CLIP model into a learning-to-rank task. Our approach incorporates learnable prompts to adapt to new instructions for ranking purposes. Our ranking-aware adapter consistently outperforms fine-tuned CLIPs on various tasks.
arXiv Detail & Related papers (2024-12-09T18:51:05Z)
Optimizing CLIP Models for Image Retrieval with Maintained Joint-Embedding Alignment [0.7499722271664144]
Contrastive Language and Image Pairing (CLIP) is a transformative method in multimedia retrieval. CLIP typically trains two neural networks concurrently to generate joint embeddings for text and image pairs. This paper addresses the challenge of optimizing CLIP models for various image-based similarity search scenarios.
arXiv Detail & Related papers (2024-09-03T14:33:01Z)
FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting [18.708185548091716]
FRAP is a simple, yet effective approach based on adaptively adjusting the per-token prompt weights to improve prompt-image alignment and authenticity of the generated images. We show FRAP generates images with significantly higher prompt-image alignment to prompts from complex datasets. We also explore combining FRAP with prompt rewriting LLM to recover their degraded prompt-image alignment.
arXiv Detail & Related papers (2024-08-21T15:30:35Z)
RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement [0.24578723416255752]
We propose a novel modification of Contrastive Language-Image Pre-Training (CLIP) guidance for the task of unsupervised backlit image enhancement. Our work builds on the state-of-the-art CLIP-LIT approach, which learns a prompt pair by constraining the text-image similarity between a prompt (negative/positive sample) and a corresponding image (backlit image/well-lit image) in the CLIP embedding space. We show that instead of tuning prompts in the space of text embeddings, it is possible to directly tune their embeddings in the latent space without any loss in quality
arXiv Detail & Related papers (2024-04-02T12:28:40Z)
CLIP Guided Image-perceptive Prompt Learning for Image Enhancement [15.40368082025006]
Contrastive Language-Image Pre-Training (CLIP) Guided Prompt Learning is proposed. We learn image-perceptive prompts to distinguish between original and target images using CLIP model. We introduce a very simple network by incorporating a simple baseline to predict the weights of three different LUT as enhancement network.
arXiv Detail & Related papers (2023-11-07T12:36:20Z)
Sentence-level Prompts Benefit Composed Image Retrieval [69.78119883060006]
Composed image retrieval (CIR) is the task of retrieving specific images by using a query that involves both a reference image and a relative caption. We propose to leverage pretrained V-L models, e.g., BLIP-2, to generate sentence-level prompts. Our proposed method performs favorably against the state-of-the-art CIR methods on the Fashion-IQ and CIRR datasets.
arXiv Detail & Related papers (2023-10-09T07:31:44Z)
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models [52.3032592038514]
We propose a class-aware text prompt to enrich generated prompts with label-related image information. We achieve an average improvement of 4.03% on new classes and 3.19% on harmonic-mean over eleven classification benchmarks.
arXiv Detail & Related papers (2023-03-30T06:02:40Z)
Texts as Images in Prompt Tuning for Multi-Label Image Recognition [70.9310322461598]
We advocate that image-text contrastive learning makes it feasible to treat texts as images for prompt tuning and introduce TaI prompting. Particularly, we apply TaI prompting to multi-label image recognition, where sentences in the wild serve as alternatives to images for prompt tuning. Our proposed TaI-DPT outperforms zero-shot CLIP by a large margin on multiple benchmarks.
arXiv Detail & Related papers (2022-11-23T07:00:11Z)
Non-Contrastive Learning Meets Language-Image Pre-Training [145.6671909437841]
We study the validity of non-contrastive language-image pre-training (nCLIP) We introduce xCLIP, a multi-tasking framework combining CLIP and nCLIP, and show that nCLIP aids CLIP in enhancing feature semantics.
arXiv Detail & Related papers (2022-10-17T17:57:46Z)
Prompt-based Learning for Unpaired Image Captioning [86.44188293709307]
Unpaired Image Captioning (UIC) has been developed to learn image descriptions from unaligned vision-language sample pairs. Recent successes of Vision-Language Pre-Trained Models (VL-PTMs) have triggered the development of prompt-based learning. We present in this paper a novel scheme based on prompt to train the UIC model, making best use of the powerful generalization ability.
arXiv Detail & Related papers (2022-05-26T03:13:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.