DiffusionSTR: Diffusion Model for Scene Text Recognition
- URL: http://arxiv.org/abs/2306.16707v1
- Date: Thu, 29 Jun 2023 06:09:32 GMT
- Title: DiffusionSTR: Diffusion Model for Scene Text Recognition
- Authors: Masato Fujitake
- Abstract summary: Diffusion Model for Scene Text Recognition (DiffusionSTR) is an end-to-end text recognition framework.
We show for the first time that the diffusion model can be applied to text recognition.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents Diffusion Model for Scene Text Recognition
(DiffusionSTR), an end-to-end text recognition framework using diffusion models
for recognizing text in the wild. While existing studies have viewed the scene
text recognition task as an image-to-text transformation, we rethought it as a
text-text one under images in a diffusion model. We show for the first time
that the diffusion model can be applied to text recognition. Furthermore,
experimental results on publicly available datasets show that the proposed
method achieves competitive accuracy compared to state-of-the-art methods.
Related papers
- JSTR: Judgment Improves Scene Text Recognition [0.0]
We present a method for enhancing the accuracy of scene text recognition tasks by judging whether the image and text match each other.
This method boosts text recognition accuracy by providing explicit feedback on the data that the model is likely to misrecognize.
arXiv Detail & Related papers (2024-04-09T02:55:12Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Seek for Incantations: Towards Accurate Text-to-Image Diffusion
Synthesis through Prompt Engineering [118.53208190209517]
We propose a framework to learn the proper textual descriptions for diffusion models through prompt learning.
Our method can effectively learn the prompts to improve the matches between the input text and the generated images.
arXiv Detail & Related papers (2024-01-12T03:46:29Z) - UDiffText: A Unified Framework for High-quality Text Synthesis in
Arbitrary Images via Character-aware Diffusion Models [25.219960711604728]
This paper proposes a novel approach for text image generation, utilizing a pre-trained diffusion model.
Our approach involves the design and training of a light-weight character-level text encoder, which replaces the original CLIP encoder.
By employing an inference stage refinement process, we achieve a notably high sequence accuracy when synthesizing text in arbitrarily given images.
arXiv Detail & Related papers (2023-12-08T07:47:46Z) - Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using
Diffusion Models [63.99110667987318]
We present DiffText, a pipeline that seamlessly blends foreground text with the background's intrinsic features.
With fewer text instances, our produced text images consistently surpass other synthetic data in aiding text detectors.
arXiv Detail & Related papers (2023-11-28T06:51:28Z) - On Manipulating Scene Text in the Wild with Diffusion Models [4.034781390227754]
We introduce Diffusion-BasEd Scene Text manipulation Network so-called DBEST.
Specifically, we design two adaptation strategies, namely one-shot style adaptation and text-recognition guidance.
Our method achieves 94.15% and 98.12% on datasets for character-level evaluation.
arXiv Detail & Related papers (2023-11-01T11:31:50Z) - Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners [88.07317175639226]
We propose a novel approach, Discriminative Stable Diffusion (DSD), which turns pre-trained text-to-image diffusion models into few-shot discriminative learners.
Our approach mainly uses the cross-attention score of a Stable Diffusion model to capture the mutual influence between visual and textual information.
arXiv Detail & Related papers (2023-05-18T05:41:36Z) - WordStylist: Styled Verbatim Handwritten Text Generation with Latent
Diffusion Models [8.334487584550185]
We present a latent diffusion-based method for styled text-to-text-content-image generation on word-level.
Our proposed method is able to generate realistic word image samples from different writer styles.
We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and get similar writer retrieval score as real data.
arXiv Detail & Related papers (2023-03-29T10:19:26Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - Reading and Writing: Discriminative and Generative Modeling for
Self-Supervised Text Recognition [101.60244147302197]
We introduce contrastive learning and masked image modeling to learn discrimination and generation of text images.
Our method outperforms previous self-supervised text recognition methods by 10.2%-20.2% on irregular scene text recognition datasets.
Our proposed text recognizer exceeds previous state-of-the-art text recognition methods by averagely 5.3% on 11 benchmarks, with similar model size.
arXiv Detail & Related papers (2022-07-01T03:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.