End-to-end Handwritten Paragraph Text Recognition Using a Vertical
Attention Network
- URL: http://arxiv.org/abs/2012.03868v1
- Date: Mon, 7 Dec 2020 17:31:20 GMT
- Title: End-to-end Handwritten Paragraph Text Recognition Using a Vertical
Attention Network
- Authors: Denis Coquenet, Cl\'ement Chatelain, Thierry Paquet
- Abstract summary: We propose a unified end-to-end model using hybrid attention to tackle this task.
We achieve state-of-the-art character error rate at line and paragraph levels on three popular datasets.
- Score: 2.277447144331876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unconstrained handwritten text recognition remains challenging for computer
vision systems. Paragraph text recognition is traditionally achieved by two
models: the first one for line segmentation and the second one for text line
recognition. We propose a unified end-to-end model using hybrid attention to
tackle this task. We achieve state-of-the-art character error rate at line and
paragraph levels on three popular datasets: 1.90% for RIMES, 4.32% for IAM and
3.63% for READ 2016. The proposed model can be trained from scratch, without
using any segmentation label contrary to the standard approach. Our code and
trained model weights are available at
https://github.com/FactoDeepLearning/VerticalAttentionOCR.
Related papers
- Retrieval Enhanced Zero-Shot Video Captioning [69.96136689829778]
We bridge video and text using three key models: a general video understanding model XCLIP, a general image understanding model CLIP, and a text generation model GPT-2.
To address this problem, we propose using learnable tokens as a communication medium between frozen GPT-2 and frozen XCLIP.
Experiments show 4% to 20% improvements in terms of the main metric CIDEr compared to the existing state-of-the-art methods.
arXiv Detail & Related papers (2024-05-11T16:22:00Z) - Towards End-to-end Handwritten Document Recognition [0.0]
Handwritten text recognition has been widely studied in the last decades for its numerous applications.
In this thesis, we propose to tackle these issues by performing the handwritten text recognition of whole document in an end-to-end way.
We reached state-of-the-art results at paragraph level on the RIMES 2011, IAM and READ 2016 datasets and outperformed the line-level state of the art on these datasets.
arXiv Detail & Related papers (2022-09-30T10:31:22Z) - Reading and Writing: Discriminative and Generative Modeling for
Self-Supervised Text Recognition [101.60244147302197]
We introduce contrastive learning and masked image modeling to learn discrimination and generation of text images.
Our method outperforms previous self-supervised text recognition methods by 10.2%-20.2% on irregular scene text recognition datasets.
Our proposed text recognizer exceeds previous state-of-the-art text recognition methods by averagely 5.3% on 11 benchmarks, with similar model size.
arXiv Detail & Related papers (2022-07-01T03:50:26Z) - DAN: a Segmentation-free Document Attention Network for Handwritten
Document Recognition [1.7875811547963403]
We propose an end-to-end segmentation-free architecture for handwritten document recognition.
The model is trained to label text parts using begin and end tags in an XML-like fashion.
We achieve competitive results on the READ dataset at page level, as well as double-page level with a CER of 3.53% and 3.69%, respectively.
arXiv Detail & Related papers (2022-03-23T08:40:42Z) - TrOCR: Transformer-based Optical Character Recognition with Pre-trained
Models [47.48019831416665]
We propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR.
TrOCR is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets.
Experiments show that the TrOCR model outperforms the current state-of-the-art models on both printed and handwritten text recognition tasks.
arXiv Detail & Related papers (2021-09-21T16:01:56Z) - Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs)
We compare their accuracy and performance on widely used public datasets of scene and handwritten text.
Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z) - SPAN: a Simple Predict & Align Network for Handwritten Paragraph
Recognition [2.277447144331876]
We propose an end-to-end recurrence-free Fully Convolutional Network performing OCR at paragraph level without any prior segmentation stage.
The framework is as simple as the one used for the recognition of isolated lines and we achieve competitive results on three popular datasets.
arXiv Detail & Related papers (2021-02-17T13:12:45Z) - TAP: Text-Aware Pre-training for Text-VQA and Text-Caption [75.44716665758415]
We propose Text-Aware Pre-training (TAP) for Text-VQA and Text-Caption tasks.
TAP explicitly incorporates scene text (generated from OCR engines) in pre-training.
Our approach outperforms the state of the art by large margins on multiple tasks.
arXiv Detail & Related papers (2020-12-08T18:55:21Z) - OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page
Text Recognition by learning to unfold [6.09170287691728]
We take a step from segmentation-free single line recognition towards segmentation-free multi-line / full page recognition.
We propose a novel and simple neural network module, termed textbfOrigamiNet, that can augment any CTC-trained, fully convolutional single line text recognizer.
We achieve state-of-the-art character error rate on both IAM & ICDAR 2017 HTR benchmarks for handwriting recognition, surpassing all other methods in the literature.
arXiv Detail & Related papers (2020-06-12T22:18:02Z) - SCATTER: Selective Context Attentional Scene Text Recognizer [16.311256552979835]
Scene Text Recognition (STR) is the task of recognizing text against complex image backgrounds.
Current state-of-the-art (SOTA) methods still struggle to recognize text written in arbitrary shapes.
We introduce a novel architecture for STR, named Selective Context ATtentional Text Recognizer (SCATTER)
arXiv Detail & Related papers (2020-03-25T09:20:28Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.