SPIRE: Semantic Prompt-Driven Image Restoration
- URL: http://arxiv.org/abs/2312.11595v2
- Date: Tue, 16 Jul 2024 08:10:03 GMT
- Title: SPIRE: Semantic Prompt-Driven Image Restoration
- Authors: Chenyang Qi, Zhengzhong Tu, Keren Ye, Mauricio Delbracio, Peyman Milanfar, Qifeng Chen, Hossein Talebi,
- Abstract summary: We develop SPIRE, a Semantic and restoration Prompt-driven Image Restoration framework.
Our approach is the first framework that supports fine-level instruction through language-based quantitative specification of the restoration strength.
Our experiments demonstrate the superior restoration performance of SPIRE compared to the state of the arts.
- Score: 66.26165625929747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-driven diffusion models have become increasingly popular for various image editing tasks, including inpainting, stylization, and object replacement. However, it still remains an open research problem to adopt this language-vision paradigm for more fine-level image processing tasks, such as denoising, super-resolution, deblurring, and compression artifact removal. In this paper, we develop SPIRE, a Semantic and restoration Prompt-driven Image Restoration framework that leverages natural language as a user-friendly interface to control the image restoration process. We consider the capacity of prompt information in two dimensions. First, we use content-related prompts to enhance the semantic alignment, effectively alleviating identity ambiguity in the restoration outcomes. Second, our approach is the first framework that supports fine-level instruction through language-based quantitative specification of the restoration strength, without the need for explicit task-specific design. In addition, we introduce a novel fusion mechanism that augments the existing ControlNet architecture by learning to rescale the generative prior, thereby achieving better restoration fidelity. Our extensive experiments demonstrate the superior restoration performance of SPIRE compared to the state of the arts, alongside offering the flexibility of text-based control over the restoration effects.
Related papers
- MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models [51.1034358143232]
We introduce component-controllable personalization, a novel task that pushes the boundaries of text-to-image (T2I) models.
To overcome these challenges, we design MagicTailor, an innovative framework that leverages Dynamic Masked Degradation (DM-Deg) to dynamically perturb undesired visual semantics.
arXiv Detail & Related papers (2024-10-17T09:22:53Z) - Diff-Restorer: Unleashing Visual Prompts for Diffusion-based Universal Image Restoration [19.87693298262894]
We propose Diff-Restorer, a universal image restoration method based on the diffusion model.
We utilize the pre-trained visual language model to extract visual prompts from degraded images.
We also design a Degradation-aware Decoder to perform structural correction and convert the latent code to the pixel domain.
arXiv Detail & Related papers (2024-07-04T05:01:10Z) - Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild [57.06779516541574]
SUPIR (Scaling-UP Image Restoration) is a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up.
We collect a dataset comprising 20 million high-resolution, high-quality images for model training, each enriched with descriptive text annotations.
arXiv Detail & Related papers (2024-01-24T17:58:07Z) - Unified-Width Adaptive Dynamic Network for All-In-One Image Restoration [50.81374327480445]
We introduce a novel concept positing that intricate image degradation can be represented in terms of elementary degradation.
We propose the Unified-Width Adaptive Dynamic Network (U-WADN), consisting of two pivotal components: a Width Adaptive Backbone (WAB) and a Width Selector (WS)
The proposed U-WADN achieves better performance while simultaneously reducing up to 32.3% of FLOPs and providing approximately 15.7% real-time acceleration.
arXiv Detail & Related papers (2024-01-24T04:25:12Z) - Improving Image Restoration through Removing Degradations in Textual
Representations [60.79045963573341]
We introduce a new perspective for improving image restoration by removing degradation in the textual representations of a degraded image.
To address the cross-modal assistance, we propose to map the degraded images into textual representations for removing the degradations.
In particular, We ingeniously embed an image-to-text mapper and text restoration module into CLIP-equipped text-to-image models to generate the guidance.
arXiv Detail & Related papers (2023-12-28T19:18:17Z) - Prompt-In-Prompt Learning for Universal Image Restoration [38.81186629753392]
We propose novel Prompt-In-Prompt learning for universal image restoration, named PIP.
We present two novel prompts, a degradation-aware prompt to encode high-level degradation knowledge and a basic restoration prompt to provide essential low-level information.
By doing so, the resultant PIP works as a plug-and-play module to enhance existing restoration models for universal image restoration.
arXiv Detail & Related papers (2023-12-08T13:36:01Z) - Multi-task Image Restoration Guided By Robust DINO Features [88.74005987908443]
We propose mboxtextbfDINO-IR, a multi-task image restoration approach leveraging robust features extracted from DINOv2.
We first propose a pixel-semantic fusion (PSF) module to dynamically fuse DINOV2's shallow features.
By formulating these modules into a unified deep model, we propose a DINO perception contrastive loss to constrain the model training.
arXiv Detail & Related papers (2023-12-04T06:59:55Z) - All-in-one Multi-degradation Image Restoration Network via Hierarchical
Degradation Representation [47.00239809958627]
We propose a novel All-in-one Multi-degradation Image Restoration Network (AMIRNet)
AMIRNet learns a degradation representation for unknown degraded images by progressively constructing a tree structure through clustering.
This tree-structured representation explicitly reflects the consistency and discrepancy of various distortions, providing a specific clue for image restoration.
arXiv Detail & Related papers (2023-08-06T04:51:41Z) - Semantic-Preserving Augmentation for Robust Image-Text Retrieval [27.2916415148638]
RVSE consists of novel image-based and text-based augmentation techniques called semantic preserving augmentation for image (SPAugI) and text (SPAugT)
Since SPAugI and SPAugT change the original data in a way that its semantic information is preserved, we enforce the feature extractors to generate semantic aware embedding vectors.
From extensive experiments using benchmark datasets, we show that RVSE outperforms conventional retrieval schemes in terms of image-text retrieval performance.
arXiv Detail & Related papers (2023-03-10T03:50:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.