Related papers: Textual Prompt Guided Image Restoration

Textual Prompt Guided Image Restoration

URL: http://arxiv.org/abs/2312.06162v1
Date: Mon, 11 Dec 2023 06:56:41 GMT
Title: Textual Prompt Guided Image Restoration
Authors: Qiuhai Yan and Aiwen Jiang and Kang Chen and Long Peng and Qiaosi Yi and Chunjie Zhang
Abstract summary: "All-in-one" models that can do blind image restoration have been concerned in recent years. Recent works focus on learning visual prompts from data distribution to identify degradation type. In this paper, an effective textual prompt guided image restoration model has been proposed.
Score: 18.78902053873706
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image restoration has always been a cutting-edge topic in the academic and industrial fields of computer vision. Since degradation signals are often random and diverse, "all-in-one" models that can do blind image restoration have been concerned in recent years. Early works require training specialized headers and tails to handle each degradation of concern, which are manually cumbersome. Recent works focus on learning visual prompts from data distribution to identify degradation type. However, the prompts employed in most of models are non-text, lacking sufficient emphasis on the importance of human-in-the-loop. In this paper, an effective textual prompt guided image restoration model has been proposed. In this model, task-specific BERT is fine-tuned to accurately understand user's instructions and generating textual prompt guidance. Depth-wise multi-head transposed attentions and gated convolution modules are designed to bridge the gap between textual prompts and visual features. The proposed model has innovatively introduced semantic prompts into low-level visual domain. It highlights the potential to provide a natural, precise, and controllable way to perform image restoration tasks. Extensive experiments have been done on public denoising, dehazing and deraining datasets. The experiment results demonstrate that, compared with popular state-of-the-art methods, the proposed model can obtain much more superior performance, achieving accurate recognition and removal of degradation without increasing model's complexity. Related source codes and data will be publicly available on github site https://github.com/MoTong-AI-studio/TextPromptIR.

Related papers

Visual Style Prompt Learning Using Diffusion Models for Blind Face Restoration [16.67947885664477]
Blind face restoration aims to recover high-quality facial images from various unidentified sources of degradation. Prior knowledge-based methods, leveraging geometric priors and facial features, have led to advancements in face restoration but often fall short of capturing fine details. We introduce a visual style prompt learning framework that utilizes diffusion probabilistic models to explicitly generate visual prompts.
arXiv Detail & Related papers (2024-12-30T16:05:40Z)
Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation [70.95783968368124]
We introduce a novel multi-modal autoregressive model, dubbed $textbfInstaManip$. We propose an innovative group self-attention mechanism to break down the in-context learning process into two separate stages. Our method surpasses previous few-shot image manipulation models by a notable margin.
arXiv Detail & Related papers (2024-12-02T01:19:21Z)
Review Learning: Advancing All-in-One Ultra-High-Definition Image Restoration Training Method [7.487270862599671]
We propose a new training paradigm for general image restoration models, which we name bfReview Learning. This approach begins with sequential training of an image restoration model on several degraded datasets, combined with a review mechanism. We design a lightweight all-purpose image restoration network that can efficiently reason about degraded images with 4K resolution on a single consumer-grade GPU.
arXiv Detail & Related papers (2024-08-13T08:08:45Z)
DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution [19.33582308829547]
This paper proposes to leverage degradation-aligned language prompt for accurate, fine-grained, and high-fidelity image restoration. The proposed method achieves a new state-of-the-art perceptual quality level.
arXiv Detail & Related papers (2024-06-24T09:30:36Z)
Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension. First, the model self-constructs a preference for image descriptions using unlabeled images. To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z)
InstructIR: High-Quality Image Restoration Following Human Instructions [61.1546287323136]
We present the first approach that uses human-written instructions to guide the image restoration model. Our method, InstructIR, achieves state-of-the-art results on several restoration tasks.
arXiv Detail & Related papers (2024-01-29T18:53:33Z)
Image Captions are Natural Prompts for Text-to-Image Models [70.30915140413383]
We analyze the relationship between the training effect of synthetic data and the synthetic data distribution induced by prompts. We propose a simple yet effective method that prompts text-to-image generative models to synthesize more informative and diverse training data. Our method significantly improves the performance of models trained on synthetic training data.
arXiv Detail & Related papers (2023-07-17T14:38:11Z)
PromptIR: Prompting for All-in-One Blind Image Restoration [64.02374293256001]
We present a prompt-based learning approach, PromptIR, for All-In-One image restoration. Our method uses prompts to encode degradation-specific information, which is then used to dynamically guide the restoration network. PromptIR offers a generic and efficient plugin module with few lightweight prompts.
arXiv Detail & Related papers (2023-06-22T17:59:52Z)
Unleashing Text-to-Image Diffusion Models for Visual Perception [84.41514649568094]
VPD (Visual Perception with a pre-trained diffusion model) is a new framework that exploits the semantic information of a pre-trained text-to-image diffusion model in visual perception tasks. We show that VPD can be faster adapted to downstream visual perception tasks using the proposed VPD.
arXiv Detail & Related papers (2023-03-03T18:59:47Z)
Learning to Prompt for Vision-Language Models [82.25005817904027]
Vision-language pre-training has emerged as a promising alternative for representation learning. It shifts from the tradition of using images and discrete labels for learning a fixed set of weights, seen as visual concepts, to aligning images and raw text for two separate encoders. Such a paradigm benefits from a broader source of supervision and allows zero-shot transfer to downstream tasks.
arXiv Detail & Related papers (2021-09-02T17:57:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.