SPAN: a Simple Predict & Align Network for Handwritten Paragraph
Recognition
- URL: http://arxiv.org/abs/2102.08742v1
- Date: Wed, 17 Feb 2021 13:12:45 GMT
- Title: SPAN: a Simple Predict & Align Network for Handwritten Paragraph
Recognition
- Authors: Denis Coquenet, Cl\'ement Chatelain, Thierry Paquet
- Abstract summary: We propose an end-to-end recurrence-free Fully Convolutional Network performing OCR at paragraph level without any prior segmentation stage.
The framework is as simple as the one used for the recognition of isolated lines and we achieve competitive results on three popular datasets.
- Score: 2.277447144331876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unconstrained handwriting recognition is an essential task in document
analysis. It is usually carried out in two steps. First, the document is
segmented into text lines. Second, an Optical Character Recognition model is
applied on these line images. We propose the Simple Predict & Align Network: an
end-to-end recurrence-free Fully Convolutional Network performing OCR at
paragraph level without any prior segmentation stage. The framework is as
simple as the one used for the recognition of isolated lines and we achieve
competitive results on three popular datasets: RIMES, IAM and READ 2016. The
proposed model does not require any dataset adaptation, it can be trained from
scratch, without segmentation labels, and it does not require line breaks in
the transcription labels. Our code and trained model weights are available at
https://github.com/FactoDeepLearning/SPAN.
Related papers
- LESS: Label-Efficient and Single-Stage Referring 3D Segmentation [55.06002976797879]
Referring 3D is a visual-language task that segments all points of the specified object from a 3D point cloud described by a sentence of query.
We propose a novel Referring 3D pipeline, Label-Efficient and Single-Stage, dubbed LESS, which is only under the supervision of efficient binary mask.
We achieve state-of-the-art performance on ScanRefer dataset by surpassing the previous methods about 3.7% mIoU using only binary labels.
arXiv Detail & Related papers (2024-10-17T07:47:41Z) - Handwritten and Printed Text Segmentation: A Signature Case Study [0.0]
We develop novel approaches to address the challenges of handwritten and printed text segmentation.
Our objective is to recover text from different classes in their entirety, especially enhancing the segmentation performance on overlapping sections.
Our best configuration outperforms prior work on two different datasets by 17.9% and 7.3% on IoU scores.
arXiv Detail & Related papers (2023-07-15T21:49:22Z) - DAN: a Segmentation-free Document Attention Network for Handwritten
Document Recognition [1.7875811547963403]
We propose an end-to-end segmentation-free architecture for handwritten document recognition.
The model is trained to label text parts using begin and end tags in an XML-like fashion.
We achieve competitive results on the READ dataset at page level, as well as double-page level with a CER of 3.53% and 3.69%, respectively.
arXiv Detail & Related papers (2022-03-23T08:40:42Z) - Document Domain Randomization for Deep Learning Document Layout
Extraction [37.97092983885967]
We present document domain randomization (DDR), the first successful transfer of convolutional neural networks (CNNs) trained only on graphically rendered pseudo-paper pages.
DDR renders pseudo-document pages by modeling randomized textual and non-textual contents of interest.
We show that high-fidelity semantic information is not necessary to label semantic classes but style mismatch between train and test can lower model accuracy.
arXiv Detail & Related papers (2021-05-20T19:16:04Z) - Cross-domain Speech Recognition with Unsupervised Character-level
Distribution Matching [60.8427677151492]
We propose CMatch, a Character-level distribution matching method to perform fine-grained adaptation between each character in two domains.
Experiments on the Libri-Adapt dataset show that our proposed approach achieves 14.39% and 16.50% relative Word Error Rate (WER) reduction on both cross-device and cross-environment ASR.
arXiv Detail & Related papers (2021-04-15T14:36:54Z) - SOLD2: Self-supervised Occlusion-aware Line Description and Detection [95.8719432775724]
We introduce the first joint detection and description of line segments in a single deep network.
Our method does not require any annotated line labels and can therefore generalize to any dataset.
We evaluate our approach against previous line detection and description methods on several multi-view datasets.
arXiv Detail & Related papers (2021-04-07T19:27:17Z) - One Thing One Click: A Self-Training Approach for Weakly Supervised 3D
Semantic Segmentation [78.36781565047656]
We propose "One Thing One Click," meaning that the annotator only needs to label one point per object.
We iteratively conduct the training and label propagation, facilitated by a graph propagation module.
Our results are also comparable to those of the fully supervised counterparts.
arXiv Detail & Related papers (2021-04-06T02:27:25Z) - End-to-end Handwritten Paragraph Text Recognition Using a Vertical
Attention Network [2.277447144331876]
We propose a unified end-to-end model using hybrid attention to tackle this task.
We achieve state-of-the-art character error rate at line and paragraph levels on three popular datasets.
arXiv Detail & Related papers (2020-12-07T17:31:20Z) - Predicting What You Already Know Helps: Provable Self-Supervised
Learning [60.27658820909876]
Self-supervised representation learning solves auxiliary prediction tasks (known as pretext tasks) without requiring labeled data.
We show a mechanism exploiting the statistical connections between certain em reconstruction-based pretext tasks that guarantee to learn a good representation.
We prove the linear layer yields small approximation error even for complex ground truth function class.
arXiv Detail & Related papers (2020-08-03T17:56:13Z) - OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page
Text Recognition by learning to unfold [6.09170287691728]
We take a step from segmentation-free single line recognition towards segmentation-free multi-line / full page recognition.
We propose a novel and simple neural network module, termed textbfOrigamiNet, that can augment any CTC-trained, fully convolutional single line text recognizer.
We achieve state-of-the-art character error rate on both IAM & ICDAR 2017 HTR benchmarks for handwriting recognition, surpassing all other methods in the literature.
arXiv Detail & Related papers (2020-06-12T22:18:02Z) - Pre-training for Abstractive Document Summarization by Reinstating
Source Text [105.77348528847337]
This paper presents three pre-training objectives which allow us to pre-train a Seq2Seq based abstractive summarization model on unlabeled text.
Experiments on two benchmark summarization datasets show that all three objectives can improve performance upon baselines.
arXiv Detail & Related papers (2020-04-04T05:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.