Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration
- URL: http://arxiv.org/abs/2412.00878v2
- Date: Fri, 06 Dec 2024 17:14:05 GMT
- Title: Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration
- Authors: Haoze Sun, Wenbo Li, Jiayue Liu, Kaiwen Zhou, Yongqiang Chen, Yong Guo, Yanwei Li, Renjing Pei, Long Peng, Yujiu Yang,
- Abstract summary: We propose using text as an auxiliary invariant representation to reactivate the generative capabilities of diffusion-based restoration models.
We introduce Res-Captioner, a module that generates enhanced textual descriptions tailored to image content and degradation levels.
We present RealIR, a new benchmark designed to capture diverse real-world scenarios.
- Score: 47.942948541067544
- License:
- Abstract: Generalization has long been a central challenge in real-world image restoration. While recent diffusion-based restoration methods, which leverage generative priors from text-to-image models, have made progress in recovering more realistic details, they still encounter "generative capability deactivation" when applied to out-of-distribution real-world data. To address this, we propose using text as an auxiliary invariant representation to reactivate the generative capabilities of these models. We begin by identifying two key properties of text input: richness and relevance, and examine their respective influence on model performance. Building on these insights, we introduce Res-Captioner, a module that generates enhanced textual descriptions tailored to image content and degradation levels, effectively mitigating response failures. Additionally, we present RealIR, a new benchmark designed to capture diverse real-world scenarios. Extensive experiments demonstrate that Res-Captioner significantly enhances the generalization abilities of diffusion-based restoration models, while remaining fully plug-and-play.
Related papers
- RAP-SR: RestorAtion Prior Enhancement in Diffusion Models for Realistic Image Super-Resolution [36.137383171027615]
We introduce RAP-SR, a restoration prior enhancement approach in pretrained diffusion models for Real-SR.
First, we develop the High-Fidelity Aesthetic Image dataset (HFAID), curated through a Quality-Driven Aesthetic Image Selection Pipeline (QDAISP)
Second, we propose the Restoration Priors Enhancement Framework, which includes Restoration Priors Refinement (RPR) and Restoration-Oriented Prompt Optimization (ROPO) modules.
arXiv Detail & Related papers (2024-12-10T03:17:38Z) - FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration [66.61201445650323]
Existing methods suffer from a generalization bottleneck in real-world scenarios.
We contribute a million-scale dataset with two notable advantages over existing training data.
We propose a robust model, FoundIR, to better address a broader range of restoration tasks in real-world scenarios.
arXiv Detail & Related papers (2024-12-02T12:08:40Z) - DIR: Retrieval-Augmented Image Captioning with Comprehensive Understanding [10.347788969721844]
Dive Into Retrieval (DIR) is designed to enhance both the image-to-text retrieval process and the utilization of retrieved text.
DIR not only maintains competitive in-domain performance but also significantly improves out-of-domain generalization, all without increasing inference costs.
arXiv Detail & Related papers (2024-12-02T04:39:17Z) - Diff-Restorer: Unleashing Visual Prompts for Diffusion-based Universal Image Restoration [19.87693298262894]
We propose Diff-Restorer, a universal image restoration method based on the diffusion model.
We utilize the pre-trained visual language model to extract visual prompts from degraded images.
We also design a Degradation-aware Decoder to perform structural correction and convert the latent code to the pixel domain.
arXiv Detail & Related papers (2024-07-04T05:01:10Z) - DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution [19.33582308829547]
This paper proposes to leverage degradation-aligned language prompt for accurate, fine-grained, and high-fidelity image restoration.
The proposed method achieves a new state-of-the-art perceptual quality level.
arXiv Detail & Related papers (2024-06-24T09:30:36Z) - Towards Realistic Data Generation for Real-World Super-Resolution [58.99206459754721]
RealDGen is an unsupervised learning data generation framework designed for real-world super-resolution.
We develop content and degradation extraction strategies, which are integrated into a novel content-degradation decoupled diffusion model.
Experiments demonstrate that RealDGen excels in generating large-scale, high-quality paired data that mirrors real-world degradations.
arXiv Detail & Related papers (2024-06-11T13:34:57Z) - Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild [57.06779516541574]
SUPIR (Scaling-UP Image Restoration) is a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up.
We collect a dataset comprising 20 million high-resolution, high-quality images for model training, each enriched with descriptive text annotations.
arXiv Detail & Related papers (2024-01-24T17:58:07Z) - SPIRE: Semantic Prompt-Driven Image Restoration [66.26165625929747]
We develop SPIRE, a Semantic and restoration Prompt-driven Image Restoration framework.
Our approach is the first framework that supports fine-level instruction through language-based quantitative specification of the restoration strength.
Our experiments demonstrate the superior restoration performance of SPIRE compared to the state of the arts.
arXiv Detail & Related papers (2023-12-18T17:02:30Z) - RestoreFormer++: Towards Real-World Blind Face Restoration from
Undegraded Key-Value Pairs [63.991802204929485]
Blind face restoration aims at recovering high-quality face images from those with unknown degradations.
Current algorithms mainly introduce priors to complement high-quality details and achieve impressive progress.
We propose RestoreFormer++, which introduces fully-spatial attention mechanisms to model the contextual information and the interplay with the priors.
We show that RestoreFormer++ outperforms state-of-the-art algorithms on both synthetic and real-world datasets.
arXiv Detail & Related papers (2023-08-14T16:04:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.