What If We Only Use Real Datasets for Scene Text Recognition? Toward
Scene Text Recognition With Fewer Labels
- URL: http://arxiv.org/abs/2103.04400v1
- Date: Sun, 7 Mar 2021 17:05:54 GMT
- Title: What If We Only Use Real Datasets for Scene Text Recognition? Toward
Scene Text Recognition With Fewer Labels
- Authors: Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa
- Abstract summary: Scene text recognition (STR) task has a common practice: All state-of-the-art STR models are trained on large synthetic data.
Training STR models on real data is nearly impossible because real data is insufficient.
We show that we can train STR models satisfactorily only with real labeled data.
- Score: 53.51264148594141
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Scene text recognition (STR) task has a common practice: All state-of-the-art
STR models are trained on large synthetic data. In contrast to this practice,
training STR models only on fewer real labels (STR with fewer labels) is
important when we have to train STR models without synthetic data: for
handwritten or artistic texts that are difficult to generate synthetically and
for languages other than English for which we do not always have synthetic
data. However, there has been implicit common knowledge that training STR
models on real data is nearly impossible because real data is insufficient. We
consider that this common knowledge has obstructed the study of STR with fewer
labels. In this work, we would like to reactivate STR with fewer labels by
disproving the common knowledge. We consolidate recently accumulated public
real data and show that we can train STR models satisfactorily only with real
labeled data. Subsequently, we find simple data augmentation to fully exploit
real data. Furthermore, we improve the models by collecting unlabeled data and
introducing semi- and self-supervised methods. As a result, we obtain a
competitive model to state-of-the-art methods. To the best of our knowledge,
this is the first study that 1) shows sufficient performance by only using real
labels and 2) introduces semi- and self-supervised methods into STR with fewer
labels. Our code and data are available:
https://github.com/ku21fan/STR-Fewer-Labels
Related papers
- Sequential Visual and Semantic Consistency for Semi-supervised Text
Recognition [56.968108142307976]
Scene text recognition (STR) is a challenging task that requires large-scale annotated data for training.
Most existing STR methods resort to synthetic data, which may introduce domain discrepancy and degrade the performance of STR models.
This paper proposes a novel semi-supervised learning method for STR that incorporates word-level consistency regularization from both visual and semantic aspects.
arXiv Detail & Related papers (2024-02-24T13:00:54Z) - Improving Text Embeddings with Large Language Models [59.930513259982725]
We introduce a novel and simple method for obtaining high-quality text embeddings using only synthetic data and less than 1k training steps.
We leverage proprietary LLMs to generate diverse synthetic data for hundreds of thousands of text embedding tasks across 93 languages.
Experiments demonstrate that our method achieves strong performance on highly competitive text embedding benchmarks without using any labeled data.
arXiv Detail & Related papers (2023-12-31T02:13:18Z) - FlatMatch: Bridging Labeled Data and Unlabeled Data with Cross-Sharpness
for Semi-Supervised Learning [73.13448439554497]
Semi-Supervised Learning (SSL) has been an effective way to leverage abundant unlabeled data with extremely scarce labeled data.
Most SSL methods are commonly based on instance-wise consistency between different data transformations.
We propose FlatMatch which minimizes a cross-sharpness measure to ensure consistent learning performance between the two datasets.
arXiv Detail & Related papers (2023-10-25T06:57:59Z) - An Efficient Active Learning Pipeline for Legal Text Classification [2.462514989381979]
We propose a pipeline for effectively using active learning with pre-trained language models in the legal domain.
We use knowledge distillation to guide the model's embeddings to a semantically meaningful space.
Our experiments on Contract-NLI, adapted to the classification task, and LEDGAR benchmarks show that our approach outperforms standard AL strategies.
arXiv Detail & Related papers (2022-11-15T13:07:02Z) - Learning Instructions with Unlabeled Data for Zero-Shot Cross-Task
Generalization [68.91386402390403]
We propose Unlabeled Data Augmented Instruction Tuning (UDIT) to take better advantage of the instructions during instruction learning.
We conduct extensive experiments to show UDIT's effectiveness in various scenarios of tasks and datasets.
arXiv Detail & Related papers (2022-10-17T15:25:24Z) - Learned Label Aggregation for Weak Supervision [8.819582879892762]
We propose a data programming approach that aggregates weak supervision signals to generate labeled data easily.
The quality of the generated labels depends on a label aggregation model that aggregates all noisy labels from all LFs to infer the ground-truth labels.
We show the model can be trained using synthetically generated data and design an effective architecture for the model.
arXiv Detail & Related papers (2022-07-27T14:36:35Z) - Pushing the Performance Limit of Scene Text Recognizer without Human
Annotation [17.092815629040388]
We aim to boost STR models by leveraging both synthetic data and the numerous real unlabeled images.
A character-level consistency regularization is designed to mitigate the misalignment between characters in sequence recognition.
arXiv Detail & Related papers (2022-04-16T04:42:02Z) - Data Augmentation for Scene Text Recognition [19.286766429954174]
Scene text recognition (STR) is a challenging task in computer vision due to the large number of possible text appearances in natural scenes.
Most STR models rely on synthetic datasets for training since there are no sufficiently big and publicly available labelled real datasets.
In this paper, we introduce STRAug which is made of 36 image augmentation functions designed for STR.
arXiv Detail & Related papers (2021-08-16T07:53:30Z) - Self-Tuning for Data-Efficient Deep Learning [75.34320911480008]
Self-Tuning is a novel approach to enable data-efficient deep learning.
It unifies the exploration of labeled and unlabeled data and the transfer of a pre-trained model.
It outperforms its SSL and TL counterparts on five tasks by sharp margins.
arXiv Detail & Related papers (2021-02-25T14:56:19Z) - Pseudo-Representation Labeling Semi-Supervised Learning [0.0]
In recent years, semi-supervised learning has shown tremendous success in leveraging unlabeled data to improve the performance of deep learning models.
This work proposes the pseudo-representation labeling, a simple and flexible framework that utilizes pseudo-labeling techniques to iteratively label a small amount of unlabeled data and use them as training data.
Compared with the existing approaches, the pseudo-representation labeling is more intuitive and can effectively solve practical problems in the real world.
arXiv Detail & Related papers (2020-05-31T03:55:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.