JSTR: Judgment Improves Scene Text Recognition
- URL: http://arxiv.org/abs/2404.05967v1
- Date: Tue, 9 Apr 2024 02:55:12 GMT
- Title: JSTR: Judgment Improves Scene Text Recognition
- Authors: Masato Fujitake,
- Abstract summary: We present a method for enhancing the accuracy of scene text recognition tasks by judging whether the image and text match each other.
This method boosts text recognition accuracy by providing explicit feedback on the data that the model is likely to misrecognize.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a method for enhancing the accuracy of scene text recognition tasks by judging whether the image and text match each other. While previous studies focused on generating the recognition results from input images, our approach also considers the model's misrecognition results to understand its error tendencies, thus improving the text recognition pipeline. This method boosts text recognition accuracy by providing explicit feedback on the data that the model is likely to misrecognize by predicting correct or incorrect between the image and text. The experimental results on publicly available datasets demonstrate that our proposed method outperforms the baseline and state-of-the-art methods in scene text recognition.
Related papers
- UNIT: Unifying Image and Text Recognition in One Vision Encoder [51.140564856352825]
UNIT is a novel training framework aimed at UNifying Image and Text recognition within a single model.
We show that UNIT significantly outperforms existing methods on document-related tasks.
Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment.
arXiv Detail & Related papers (2024-09-06T08:02:43Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Orientation-Independent Chinese Text Recognition in Scene Images [61.34060587461462]
We take the first attempt to extract orientation-independent visual features by disentangling content and orientation information of text images.
Specifically, we introduce a Character Image Reconstruction Network (CIRN) to recover corresponding printed character images with disentangled content and orientation information.
arXiv Detail & Related papers (2023-09-03T05:30:21Z) - DiffusionSTR: Diffusion Model for Scene Text Recognition [0.0]
Diffusion Model for Scene Text Recognition (DiffusionSTR) is an end-to-end text recognition framework.
We show for the first time that the diffusion model can be applied to text recognition.
arXiv Detail & Related papers (2023-06-29T06:09:32Z) - CLIPTER: Looking at the Bigger Picture in Scene Text Recognition [10.561377899703238]
We harness the capabilities of modern vision-language models, such as CLIP, to provide scene-level information to the crop-based recognizer.
We achieve this by fusing a rich representation of the entire image, obtained from the vision-language model, with the recognizer word-level features via a cross-attention gated mechanism.
arXiv Detail & Related papers (2023-01-18T12:16:19Z) - Self-supervised Character-to-Character Distillation for Text Recognition [54.12490492265583]
We propose a novel self-supervised Character-to-Character Distillation method, CCD, which enables versatile augmentations to facilitate text representation learning.
CCD achieves state-of-the-art results, with average performance gains of 1.38% in text recognition, 1.7% in text segmentation, 0.24 dB (PSNR) and 0.0321 (SSIM) in text super-resolution.
arXiv Detail & Related papers (2022-11-01T05:48:18Z) - Reading and Writing: Discriminative and Generative Modeling for
Self-Supervised Text Recognition [101.60244147302197]
We introduce contrastive learning and masked image modeling to learn discrimination and generation of text images.
Our method outperforms previous self-supervised text recognition methods by 10.2%-20.2% on irregular scene text recognition datasets.
Our proposed text recognizer exceeds previous state-of-the-art text recognition methods by averagely 5.3% on 11 benchmarks, with similar model size.
arXiv Detail & Related papers (2022-07-01T03:50:26Z) - Knowledge Mining with Scene Text for Fine-Grained Recognition [53.74297368412834]
We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image.
We employ KnowBert to retrieve relevant knowledge for semantic representation and combine it with image features for fine-grained classification.
Our method outperforms the state-of-the-art by 3.72% mAP and 5.39% mAP, respectively.
arXiv Detail & Related papers (2022-03-27T05:54:00Z) - IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text
Recognition [20.741958198581173]
We propose an Iterative Fusion based Recognizer (IFR) for low quality scene text recognition.
IFR contains two branches which focus on scene text recognition and low quality scene text image recovery respectively.
A feature fusion module is proposed to strengthen the feature representation of the two branches.
arXiv Detail & Related papers (2021-08-13T10:45:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.