Text Recognition -- Real World Data and Where to Find Them
- URL: http://arxiv.org/abs/2007.03098v2
- Date: Fri, 17 Jul 2020 15:07:40 GMT
- Title: Text Recognition -- Real World Data and Where to Find Them
- Authors: Kl\'ara Janou\v{s}kov\'a, Jiri Matas, Lluis Gomez, Dimosthenis
Karatzas
- Abstract summary: We present a method for exploiting weakly annotated images to improve text extraction pipelines.
The approach uses an arbitrary end-to-end text recognition system to obtain text region proposals and their, possibly erroneous, transcriptions.
It produces nearly error-free, localised instances of scene text, which we treat as "pseudo ground truth" (PGT)
- Score: 36.10220484561196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a method for exploiting weakly annotated images to improve text
extraction pipelines. The approach uses an arbitrary end-to-end text
recognition system to obtain text region proposals and their, possibly
erroneous, transcriptions. The proposed method includes matching of imprecise
transcription to weak annotations and edit distance guided neighbourhood
search. It produces nearly error-free, localised instances of scene text, which
we treat as "pseudo ground truth" (PGT).
We apply the method to two weakly-annotated datasets. Training with the
extracted PGT consistently improves the accuracy of a state of the art
recognition model, by 3.7~\% on average, across different benchmark datasets
(image domains) and 24.5~\% on one of the weakly annotated datasets.
Related papers
- Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - LRANet: Towards Accurate and Efficient Scene Text Detection with
Low-Rank Approximation Network [63.554061288184165]
We propose a novel parameterized text shape method based on low-rank approximation.
By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation.
We implement an accurate and efficient arbitrary-shaped text detector named LRANet.
arXiv Detail & Related papers (2023-06-27T02:03:46Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control.
In addition to a global text prompt that describes the entire scene, the user provides a segmentation map.
We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z) - Towards Weakly-Supervised Text Spotting using a Multi-Task Transformer [21.479222207347238]
We introduce TextTranSpotter (TTS), a transformer-based approach for text spotting.
TTS is trained with both fully- and weakly-supervised settings.
trained in a fully-supervised manner, TextTranSpotter shows state-of-the-art results on multiple benchmarks.
arXiv Detail & Related papers (2022-02-11T08:50:09Z) - Bidirectional Regression for Arbitrary-Shaped Text Detection [16.30976392505236]
This paper presents a novel text instance expression which integrates both foreground and background information into the pipeline.
A corresponding post-processing algorithm is also designed to sequentially combine the four prediction results and reconstruct the text instance accurately.
We evaluate our method on several challenging scene text benchmarks, including both curved and multi-oriented text datasets.
arXiv Detail & Related papers (2021-07-13T14:29:09Z) - Scene Text Detection with Scribble Lines [59.698806258671105]
We propose to annotate texts by scribble lines instead of polygons for text detection.
It is a general labeling method for texts with various shapes and requires low labeling costs.
Experiments show that the proposed method bridges the performance gap between the weakly labeling method and the original polygon-based labeling methods.
arXiv Detail & Related papers (2020-12-09T13:14:53Z) - Weakly-Supervised Arbitrary-Shaped Text Detection with
Expectation-Maximization Algorithm [35.0126313032923]
We study weakly-supervised arbitrary-shaped text detection for combining various weak supervision forms.
We propose an Expectation-Maximization (EM) based weakly-supervised learning framework to train an accurate arbitrary-shaped text detector.
Our method yields comparable performance to state-of-the-art methods on three benchmarks.
arXiv Detail & Related papers (2020-12-01T11:45:39Z) - ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene
Text Detection [147.10751375922035]
We propose the ContourNet, which effectively handles false positives and large scale variance of scene texts.
Our method effectively suppresses these false positives by only outputting predictions with high response value in both directions.
arXiv Detail & Related papers (2020-04-10T08:15:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.