Sequential Visual and Semantic Consistency for Semi-supervised Text
Recognition
- URL: http://arxiv.org/abs/2402.15806v1
- Date: Sat, 24 Feb 2024 13:00:54 GMT
- Title: Sequential Visual and Semantic Consistency for Semi-supervised Text
Recognition
- Authors: Mingkun Yang, Biao Yang, Minghui Liao, Yingying Zhu, Xiang Bai
- Abstract summary: Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training.
Most existing STR methods resort to synthetic data, which may introduce domain discrepancy and degrade the performance of STR models.
This paper proposes a novel semi-supervised learning method for STR that incorporates word-level consistency regularization from both visual and semantic aspects.
- Score: 56.968108142307976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene text recognition (STR) is a challenging task that requires large-scale
annotated data for training. However, collecting and labeling real text images
is expensive and time-consuming, which limits the availability of real data.
Therefore, most existing STR methods resort to synthetic data, which may
introduce domain discrepancy and degrade the performance of STR models. To
alleviate this problem, recent semi-supervised STR methods exploit unlabeled
real data by enforcing character-level consistency regularization between
weakly and strongly augmented views of the same image. However, these methods
neglect word-level consistency, which is crucial for sequence recognition
tasks. This paper proposes a novel semi-supervised learning method for STR that
incorporates word-level consistency regularization from both visual and
semantic aspects. Specifically, we devise a shortest path alignment module to
align the sequential visual features of different views and minimize their
distance. Moreover, we adopt a reinforcement learning framework to optimize the
semantic similarity of the predicted strings in the embedding space. We conduct
extensive experiments on several standard and challenging STR benchmarks and
demonstrate the superiority of our proposed method over existing
semi-supervised STR methods.
Related papers
- Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition [36.59116507158687]
We introduce a unified framework of Contrastive Learning and Masked Image Modeling for STR (RCMSTR)
The proposed RCMSTR demonstrates superior performance in various STR-related downstream tasks, outperforming the existing state-of-the-art self-supervised STR techniques.
arXiv Detail & Related papers (2024-11-18T01:11:47Z) - Pointer-Guided Pre-Training: Infusing Large Language Models with Paragraph-Level Contextual Awareness [3.2925222641796554]
"pointer-guided segment ordering" (SO) is a novel pre-training technique aimed at enhancing the contextual understanding of paragraph-level text representations.
Our experiments show that pointer-guided pre-training significantly enhances the model's ability to understand complex document structures.
arXiv Detail & Related papers (2024-06-06T15:17:51Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - Pushing the Performance Limit of Scene Text Recognizer without Human
Annotation [17.092815629040388]
We aim to boost STR models by leveraging both synthetic data and the numerous real unlabeled images.
A character-level consistency regularization is designed to mitigate the misalignment between characters in sequence recognition.
arXiv Detail & Related papers (2022-04-16T04:42:02Z) - Real-Time Scene Text Detection with Differentiable Binarization and
Adaptive Scale Fusion [62.269219152425556]
segmentation-based scene text detection methods have drawn extensive attention in the scene text detection field.
We propose a Differentiable Binarization (DB) module that integrates the binarization process into a segmentation network.
An efficient Adaptive Scale Fusion (ASF) module is proposed to improve the scale robustness by fusing features of different scales adaptively.
arXiv Detail & Related papers (2022-02-21T15:30:14Z) - Text Recognition in Real Scenarios with a Few Labeled Samples [55.07859517380136]
Scene text recognition (STR) is still a hot research topic in computer vision field.
This paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation.
Our approach can maximize the character-level confusion between the source domain and the target domain.
arXiv Detail & Related papers (2020-06-22T13:03:01Z) - Towards Accurate Scene Text Recognition with Semantic Reasoning Networks [52.86058031919856]
We propose a novel end-to-end trainable framework named semantic reasoning network (SRN) for accurate scene text recognition.
GSRM is introduced to capture global semantic context through multi-way parallel transmission.
Results on 7 public benchmarks, including regular text, irregular text and non-Latin long text, verify the effectiveness and robustness of the proposed method.
arXiv Detail & Related papers (2020-03-27T09:19:25Z) - AutoSTR: Efficient Backbone Search for Scene Text Recognition [80.7290173000068]
Scene text recognition (STR) is very challenging due to the diversity of text instances and the complexity of scenes.
We propose automated STR (AutoSTR) to search data-dependent backbones to boost text recognition performance.
Experiments demonstrate that, by searching data-dependent backbones, AutoSTR can outperform the state-of-the-art approaches on standard benchmarks.
arXiv Detail & Related papers (2020-03-14T06:51:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.