OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page
Text Recognition by learning to unfold
- URL: http://arxiv.org/abs/2006.07491v1
- Date: Fri, 12 Jun 2020 22:18:02 GMT
- Title: OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page
Text Recognition by learning to unfold
- Authors: Mohamed Yousef, Tom E. Bishop
- Abstract summary: We take a step from segmentation-free single line recognition towards segmentation-free multi-line / full page recognition.
We propose a novel and simple neural network module, termed textbfOrigamiNet, that can augment any CTC-trained, fully convolutional single line text recognizer.
We achieve state-of-the-art character error rate on both IAM & ICDAR 2017 HTR benchmarks for handwriting recognition, surpassing all other methods in the literature.
- Score: 6.09170287691728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text recognition is a major computer vision task with a big set of associated
challenges. One of those traditional challenges is the coupled nature of text
recognition and segmentation. This problem has been progressively solved over
the past decades, going from segmentation based recognition to segmentation
free approaches, which proved more accurate and much cheaper to annotate data
for. We take a step from segmentation-free single line recognition towards
segmentation-free multi-line / full page recognition. We propose a novel and
simple neural network module, termed \textbf{OrigamiNet}, that can augment any
CTC-trained, fully convolutional single line text recognizer, to convert it
into a multi-line version by providing the model with enough spatial capacity
to be able to properly collapse a 2D input signal into 1D without losing
information. Such modified networks can be trained using exactly their same
simple original procedure, and using only \textbf{unsegmented} image and text
pairs. We carry out a set of interpretability experiments that show that our
trained models learn an accurate implicit line segmentation. We achieve
state-of-the-art character error rate on both IAM \& ICDAR 2017 HTR benchmarks
for handwriting recognition, surpassing all other methods in the literature. On
IAM we even surpass single line methods that use accurate localization
information during training. Our code is available online at
\url{https://github.com/IntuitionMachines/OrigamiNet}.
Related papers
- General Detection-based Text Line Recognition [15.761142324480165]
We introduce a general detection-based approach to text line recognition, be it printed (OCR) or handwritten (HTR)
Our approach builds on a completely different paradigm than state-of-the-art HTR methods, which rely on autoregressive decoding.
We improve state-of-the-art performances for Chinese script recognition on the CASIA v2 dataset, and for cipher recognition on the Borg and Copiale datasets.
arXiv Detail & Related papers (2024-09-25T17:05:55Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text
Spotting [129.73247700864385]
DeepSolo is a simple detection transformer baseline that lets a single Decoder with Explicit Points Solo for text detection and recognition simultaneously.
We introduce a text-matching criterion to deliver more accurate supervisory signals, thus enabling more efficient training.
arXiv Detail & Related papers (2022-11-19T19:06:22Z) - DAN: a Segmentation-free Document Attention Network for Handwritten
Document Recognition [1.7875811547963403]
We propose an end-to-end segmentation-free architecture for handwritten document recognition.
The model is trained to label text parts using begin and end tags in an XML-like fashion.
We achieve competitive results on the READ dataset at page level, as well as double-page level with a CER of 3.53% and 3.69%, respectively.
arXiv Detail & Related papers (2022-03-23T08:40:42Z) - Unsupervised learning of text line segmentation by differentiating
coarse patterns [0.0]
We present an unsupervised deep learning method that embeds document image patches to a compact Euclidean space where distances correspond to a coarse text line pattern similarity.
Text line segmentation can be easily implemented using standard techniques with the embedded feature vectors.
We evaluate the method qualitatively and quantitatively on several variants of text line segmentation datasets to demonstrate its effectivity.
arXiv Detail & Related papers (2021-05-19T21:21:30Z) - Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs)
We compare their accuracy and performance on widely used public datasets of scene and handwritten text.
Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z) - SOLD2: Self-supervised Occlusion-aware Line Description and Detection [95.8719432775724]
We introduce the first joint detection and description of line segments in a single deep network.
Our method does not require any annotated line labels and can therefore generalize to any dataset.
We evaluate our approach against previous line detection and description methods on several multi-view datasets.
arXiv Detail & Related papers (2021-04-07T19:27:17Z) - One Thing One Click: A Self-Training Approach for Weakly Supervised 3D
Semantic Segmentation [78.36781565047656]
We propose "One Thing One Click," meaning that the annotator only needs to label one point per object.
We iteratively conduct the training and label propagation, facilitated by a graph propagation module.
Our results are also comparable to those of the fully supervised counterparts.
arXiv Detail & Related papers (2021-04-06T02:27:25Z) - SPAN: a Simple Predict & Align Network for Handwritten Paragraph
Recognition [2.277447144331876]
We propose an end-to-end recurrence-free Fully Convolutional Network performing OCR at paragraph level without any prior segmentation stage.
The framework is as simple as the one used for the recognition of isolated lines and we achieve competitive results on three popular datasets.
arXiv Detail & Related papers (2021-02-17T13:12:45Z) - End-to-end Handwritten Paragraph Text Recognition Using a Vertical
Attention Network [2.277447144331876]
We propose a unified end-to-end model using hybrid attention to tackle this task.
We achieve state-of-the-art character error rate at line and paragraph levels on three popular datasets.
arXiv Detail & Related papers (2020-12-07T17:31:20Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.