IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text
Recognition
- URL: http://arxiv.org/abs/2108.06166v1
- Date: Fri, 13 Aug 2021 10:45:01 GMT
- Title: IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text
Recognition
- Authors: Zhiwei Jia and Shugong Xu and Shiyi Mu and Yue Tao and Shan Cao and
Zhiyong Chen
- Abstract summary: We propose an Iterative Fusion based Recognizer (IFR) for low quality scene text recognition.
IFR contains two branches which focus on scene text recognition and low quality scene text image recovery respectively.
A feature fusion module is proposed to strengthen the feature representation of the two branches.
- Score: 20.741958198581173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although recent works based on deep learning have made progress in improving
recognition accuracy on scene text recognition, how to handle low-quality text
images in end-to-end deep networks remains a research challenge. In this paper,
we propose an Iterative Fusion based Recognizer (IFR) for low quality scene
text recognition, taking advantage of refined text images input and robust
feature representation. IFR contains two branches which focus on scene text
recognition and low quality scene text image recovery respectively. We utilize
an iterative collaboration between two branches, which can effectively
alleviate the impact of low quality input. A feature fusion module is proposed
to strengthen the feature representation of the two branches, where the
features from the Recognizer are Fused with image Restoration branch, referred
to as RRF. Without changing the recognition network structure, extensive
quantitative and qualitative experimental results show that the proposed method
significantly outperforms the baseline methods in boosting the recognition
accuracy of benchmark datasets and low resolution images in TextZoom dataset.
Related papers
- UNIT: Unifying Image and Text Recognition in One Vision Encoder [51.140564856352825]
UNIT is a novel training framework aimed at UNifying Image and Text recognition within a single model.
We show that UNIT significantly outperforms existing methods on document-related tasks.
Notably, UNIT retains the original vision encoder architecture, making it cost-free in terms of inference and deployment.
arXiv Detail & Related papers (2024-09-06T08:02:43Z) - JSTR: Judgment Improves Scene Text Recognition [0.0]
We present a method for enhancing the accuracy of scene text recognition tasks by judging whether the image and text match each other.
This method boosts text recognition accuracy by providing explicit feedback on the data that the model is likely to misrecognize.
arXiv Detail & Related papers (2024-04-09T02:55:12Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting [126.01629300244001]
We propose a new end-to-end scene text spotting framework termed SwinTextSpotter v2.
We enhance the relationship between two tasks using novel Recognition Conversion and Recognition Alignment modules.
SwinTextSpotter v2 achieved state-of-the-art performance on various multilingual (English, Chinese, and Vietnamese) benchmarks.
arXiv Detail & Related papers (2024-01-15T12:33:00Z) - Recognition-Guided Diffusion Model for Scene Text Image Super-Resolution [15.391125077873745]
Scene Text Image Super-Resolution (STISR) aims to enhance the resolution and legibility of text within low-resolution (LR) images.
Previous methods predominantly employ discriminative Convolutional Neural Networks (CNNs) augmented with diverse forms of text guidance.
We introduce RGDiffSR, a Recognition-Guided Diffusion model for scene text image Super-Resolution, which exhibits great generative diversity and fidelity even in challenging scenarios.
arXiv Detail & Related papers (2023-11-22T11:10:45Z) - One-stage Low-resolution Text Recognition with High-resolution Knowledge
Transfer [53.02254290682613]
Current solutions for low-resolution text recognition typically rely on a two-stage pipeline.
We propose an efficient and effective knowledge distillation framework to achieve multi-level knowledge transfer.
Experiments show that the proposed one-stage pipeline significantly outperforms super-resolution based two-stage frameworks.
arXiv Detail & Related papers (2023-08-05T02:33:45Z) - Self-supervised Character-to-Character Distillation for Text Recognition [54.12490492265583]
We propose a novel self-supervised Character-to-Character Distillation method, CCD, which enables versatile augmentations to facilitate text representation learning.
CCD achieves state-of-the-art results, with average performance gains of 1.38% in text recognition, 1.7% in text segmentation, 0.24 dB (PSNR) and 0.0321 (SSIM) in text super-resolution.
arXiv Detail & Related papers (2022-11-01T05:48:18Z) - Scene Text Image Super-Resolution via Content Perceptual Loss and
Criss-Cross Transformer Blocks [48.81850740907517]
We present TATSR, a Text-Aware Text Super-Resolution framework.
It effectively learns the unique text characteristics using Criss-Cross Transformer Blocks (CCTBs) and a novel Content Perceptual (CP) Loss.
It outperforms state-of-the-art methods in terms of both recognition accuracy and human perception.
arXiv Detail & Related papers (2022-10-13T11:48:45Z) - Optimal Boxes: Boosting End-to-End Scene Text Recognition by Adjusting
Annotated Bounding Boxes via Reinforcement Learning [41.56134008044702]
Box is a reinforcement learning-based method for adjusting the shape of each text bounding box to make it more compatible with text recognition models.
Experiments demonstrate that the performance of end-to-end text recognition systems can be improved when using the adjusted bounding boxes as the ground truths for training.
arXiv Detail & Related papers (2022-07-25T06:58:45Z) - Primitive Representation Learning for Scene Text Recognition [7.818765015637802]
We propose a primitive representation learning method that aims to exploit intrinsic representations of scene text images.
A Primitive REpresentation learning Network (PREN) is constructed to use the visual text representations for parallel decoding.
We also propose a framework called PREN2D to alleviate the misalignment problem in attention-based methods.
arXiv Detail & Related papers (2021-05-10T11:54:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.