Text-Aware Image Restoration with Diffusion Models
- URL: http://arxiv.org/abs/2506.09993v2
- Date: Thu, 03 Jul 2025 06:39:43 GMT
- Title: Text-Aware Image Restoration with Diffusion Models
- Authors: Jaewon Min, Jin Hyeon Kim, Paul Hyunbin Cho, Jaeeun Lee, Jihye Park, Minkyu Park, Sangpil Kim, Hyunhee Park, Seungryong Kim,
- Abstract summary: Text-Aware Image Restoration (TAIR) is a novel restoration task that requires the simultaneous recovery of visual contents and textual fidelity.<n>To tackle this task, we present SA-Text, a large-scale benchmark of 100K high-quality scene images densely annotated with diverse and complex text instances.<n>Our approach consistently outperforms state-of-the-art restoration methods, achieving significant gains in text recognition accuracy.
- Score: 30.127247716169666
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image restoration aims to recover degraded images. However, existing diffusion-based restoration methods, despite great success in natural image restoration, often struggle to faithfully reconstruct textual regions in degraded images. Those methods frequently generate plausible but incorrect text-like patterns, a phenomenon we refer to as text-image hallucination. In this paper, we introduce Text-Aware Image Restoration (TAIR), a novel restoration task that requires the simultaneous recovery of visual contents and textual fidelity. To tackle this task, we present SA-Text, a large-scale benchmark of 100K high-quality scene images densely annotated with diverse and complex text instances. Furthermore, we propose a multi-task diffusion framework, called TeReDiff, that integrates internal features from diffusion models into a text-spotting module, enabling both components to benefit from joint training. This allows for the extraction of rich text representations, which are utilized as prompts in subsequent denoising steps. Extensive experiments demonstrate that our approach consistently outperforms state-of-the-art restoration methods, achieving significant gains in text recognition accuracy. See our project page: https://cvlab-kaist.github.io/TAIR/
Related papers
- Conditional Text-to-Image Generation with Reference Guidance [81.99538302576302]
This paper explores using additional conditions of an image that provides visual guidance of the particular subjects for diffusion models to generate.
We develop several small-scale expert plugins that efficiently endow a Stable Diffusion model with the capability to take different references.
Our expert plugins demonstrate superior results than the existing methods on all tasks, each containing only 28.55M trainable parameters.
arXiv Detail & Related papers (2024-11-22T21:38:51Z) - Seek for Incantations: Towards Accurate Text-to-Image Diffusion
Synthesis through Prompt Engineering [118.53208190209517]
We propose a framework to learn the proper textual descriptions for diffusion models through prompt learning.
Our method can effectively learn the prompts to improve the matches between the input text and the generated images.
arXiv Detail & Related papers (2024-01-12T03:46:29Z) - Improving Image Restoration through Removing Degradations in Textual
Representations [60.79045963573341]
We introduce a new perspective for improving image restoration by removing degradation in the textual representations of a degraded image.
To address the cross-modal assistance, we propose to map the degraded images into textual representations for removing the degradations.
In particular, We ingeniously embed an image-to-text mapper and text restoration module into CLIP-equipped text-to-image models to generate the guidance.
arXiv Detail & Related papers (2023-12-28T19:18:17Z) - SPIRE: Semantic Prompt-Driven Image Restoration [66.26165625929747]
We develop SPIRE, a Semantic and restoration Prompt-driven Image Restoration framework.
Our approach is the first framework that supports fine-level instruction through language-based quantitative specification of the restoration strength.
Our experiments demonstrate the superior restoration performance of SPIRE compared to the state of the arts.
arXiv Detail & Related papers (2023-12-18T17:02:30Z) - Textual Prompt Guided Image Restoration [18.78902053873706]
"All-in-one" models that can do blind image restoration have been concerned in recent years.
Recent works focus on learning visual prompts from data distribution to identify degradation type.
In this paper, an effective textual prompt guided image restoration model has been proposed.
arXiv Detail & Related papers (2023-12-11T06:56:41Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - Image Super-Resolution with Text Prompt Diffusion [118.023531454099]
We introduce text prompts to image SR to provide degradation priors.<n>PromptSR leverages the latest multi-modal large language model (MLLM) to generate prompts from low-resolution images.<n>Experiments indicate that introducing text prompts into SR, yields impressive results on both synthetic and real-world images.
arXiv Detail & Related papers (2023-11-24T05:11:35Z) - Text-guided Image Restoration and Semantic Enhancement for Text-to-Image Person Retrieval [12.057465578064345]
The goal of Text-to-Image Person Retrieval (TIPR) is to retrieve specific person images according to the given textual descriptions.<n>We propose a novel TIPR framework to build fine-grained interactions and alignment between person images and the corresponding texts.
arXiv Detail & Related papers (2023-07-18T08:23:46Z) - Extremely Low-light Image Enhancement with Scene Text Restoration [29.08094129045479]
A novel image enhancement framework is proposed to precisely restore the scene texts.
We employ a self-regularised attention map, an edge map, and a novel text detection loss.
The proposed model outperforms state-of-the-art methods in image restoration, text detection, and text spotting.
arXiv Detail & Related papers (2022-04-01T16:10:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.