Diffusion-based Blind Text Image Super-Resolution
- URL: http://arxiv.org/abs/2312.08886v2
- Date: Sun, 3 Mar 2024 08:08:03 GMT
- Title: Diffusion-based Blind Text Image Super-Resolution
- Authors: Yuzhe Zhang, Jiawei Zhang, Hao Li, Zhouxia Wang, Luwei Hou, Dongqing
Zou, Liheng Bian
- Abstract summary: We propose an Image Diffusion Model (IDM) to restore text images with realistic styles.
For diffusion models, they are not only suitable for modeling realistic image distribution but also appropriate for learning text distribution.
We also propose a Text Diffusion Model (TDM) for text recognition which can guide IDM to generate text images with correct structures.
- Score: 20.91578221617732
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recovering degraded low-resolution text images is challenging, especially for
Chinese text images with complex strokes and severe degradation in real-world
scenarios. Ensuring both text fidelity and style realness is crucial for
high-quality text image super-resolution. Recently, diffusion models have
achieved great success in natural image synthesis and restoration due to their
powerful data distribution modeling abilities and data generation capabilities.
In this work, we propose an Image Diffusion Model (IDM) to restore text images
with realistic styles. For diffusion models, they are not only suitable for
modeling realistic image distribution but also appropriate for learning text
distribution. Since text prior is important to guarantee the correctness of the
restored text structure according to existing arts, we also propose a Text
Diffusion Model (TDM) for text recognition which can guide IDM to generate text
images with correct structures. We further propose a Mixture of Multi-modality
module (MoM) to make these two diffusion models cooperate with each other in
all the diffusion steps. Extensive experiments on synthetic and real-world
datasets demonstrate that our Diffusion-based Blind Text Image Super-Resolution
(DiffTSR) can restore text images with more accurate text structures as well as
more realistic appearances simultaneously.
Related papers
- ARTIST: Improving the Generation of Text-rich Images with Disentangled Diffusion Models [52.23899502520261]
We introduce a new framework named ARTIST to focus on the learning of text structures.
We finetune a visual diffusion model, enabling it to assimilate textual structure information from the pretrained textual model.
Empirical results on the MARIO-Eval benchmark underscore the effectiveness of the proposed method, showing an improvement of up to 15% in various metrics.
arXiv Detail & Related papers (2024-06-17T19:31:24Z) - TextCraftor: Your Text Encoder Can be Image Quality Controller [65.27457900325462]
Diffusion-based text-to-image generative models, e.g., Stable Diffusion, have revolutionized the field of content generation.
We propose a proposed fine-tuning approach, TextCraftor, to enhance the performance of text-to-image diffusion models.
arXiv Detail & Related papers (2024-03-27T19:52:55Z) - Text Image Inpainting via Global Structure-Guided Diffusion Models [22.859984320894135]
Real-world text can be damaged by corrosion issues caused by environmental or human factors.
Current inpainting techniques often fail to adequately address this problem.
We develop a novel neural framework, Global Structure-guided Diffusion Model (GSDM), as a potential solution.
arXiv Detail & Related papers (2024-01-26T13:01:28Z) - Seek for Incantations: Towards Accurate Text-to-Image Diffusion
Synthesis through Prompt Engineering [118.53208190209517]
We propose a framework to learn the proper textual descriptions for diffusion models through prompt learning.
Our method can effectively learn the prompts to improve the matches between the input text and the generated images.
arXiv Detail & Related papers (2024-01-12T03:46:29Z) - UDiffText: A Unified Framework for High-quality Text Synthesis in
Arbitrary Images via Character-aware Diffusion Models [25.219960711604728]
This paper proposes a novel approach for text image generation, utilizing a pre-trained diffusion model.
Our approach involves the design and training of a light-weight character-level text encoder, which replaces the original CLIP encoder.
By employing an inference stage refinement process, we achieve a notably high sequence accuracy when synthesizing text in arbitrarily given images.
arXiv Detail & Related papers (2023-12-08T07:47:46Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - De-Diffusion Makes Text a Strong Cross-Modal Interface [33.90004746543745]
We employ an autoencoder that uses a pre-trained text-to-image diffusion model for decoding.
Experiments validate the precision and comprehensiveness of De-Diffusion text representing images.
A single De-Diffusion model can generalize to provide transferable prompts for different text-to-image tools.
arXiv Detail & Related papers (2023-11-01T16:12:40Z) - TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image
Super-Resolution [18.73348268987249]
TextDiff is a diffusion-based framework tailored for scene text image super-resolution.
It achieves state-of-the-art (SOTA) performance on public benchmark datasets.
Our proposed MRD module is plug-and-play that effectively sharpens the text edges produced by SOTA methods.
arXiv Detail & Related papers (2023-08-13T11:02:16Z) - SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
Large Language Models [56.88192537044364]
We propose a simple-yet-effective parameter-efficient fine-tuning approach called the Semantic Understanding and Reasoning adapter (SUR-adapter) for pre-trained diffusion models.
Our approach can make text-to-image diffusion models easier to use with better user experience.
arXiv Detail & Related papers (2023-05-09T05:48:38Z) - Photorealistic Text-to-Image Diffusion Models with Deep Language
Understanding [53.170767750244366]
Imagen is a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.
To assess text-to-image models in greater depth, we introduce DrawBench, a comprehensive and challenging benchmark for text-to-image models.
arXiv Detail & Related papers (2022-05-23T17:42:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.