Towards End-to-end Handwritten Document Recognition
- URL: http://arxiv.org/abs/2209.15362v1
- Date: Fri, 30 Sep 2022 10:31:22 GMT
- Title: Towards End-to-end Handwritten Document Recognition
- Authors: Denis Coquenet
- Abstract summary: Handwritten text recognition has been widely studied in the last decades for its numerous applications.
In this thesis, we propose to tackle these issues by performing the handwritten text recognition of whole document in an end-to-end way.
We reached state-of-the-art results at paragraph level on the RIMES 2011, IAM and READ 2016 datasets and outperformed the line-level state of the art on these datasets.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Handwritten text recognition has been widely studied in the last decades for
its numerous applications. Nowadays, the state-of-the-art approach consists in
a three-step process. The document is segmented into text lines, which are then
ordered and recognized. However, this three-step approach has many drawbacks.
The three steps are treated independently whereas they are closely related.
Errors accumulate from one step to the other. The ordering step is based on
heuristic rules which prevent its use for documents with a complex layouts or
for heterogeneous documents. The need for additional physical segmentation
annotations for training the segmentation stage is inherent to this approach.
In this thesis, we propose to tackle these issues by performing the handwritten
text recognition of whole document in an end-to-end way. To this aim, we
gradually increase the difficulty of the recognition task, moving from isolated
lines to paragraphs, and then to whole documents. We proposed an approach at
the line level, based on a fully convolutional network, in order to design a
first generic feature extraction step for the handwriting recognition task.
Based on this preliminary work, we studied two different approaches to
recognize handwritten paragraphs. We reached state-of-the-art results at
paragraph level on the RIMES 2011, IAM and READ 2016 datasets and outperformed
the line-level state of the art on these datasets. We finally proposed the
first end-to-end approach dedicated to the recognition of both text and layout,
at document level. Characters and layout tokens are sequentially predicted
following a learned reading order. We proposed two new metrics we used to
evaluate this task on the RIMES 2009 and READ 2016 dataset, at page level and
double-page level.
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - The Power of Summary-Source Alignments [62.76959473193149]
Multi-document summarization (MDS) is a challenging task, often decomposed to subtasks of salience and redundancy detection.
alignment of corresponding sentences between a reference summary and its source documents has been leveraged to generate training data.
This paper proposes extending the summary-source alignment framework by applying it at the more fine-grained proposition span level.
arXiv Detail & Related papers (2024-06-02T19:35:19Z) - Handwritten and Printed Text Segmentation: A Signature Case Study [0.0]
We develop novel approaches to address the challenges of handwritten and printed text segmentation.
Our objective is to recover text from different classes in their entirety, especially enhancing the segmentation performance on overlapping sections.
Our best configuration outperforms prior work on two different datasets by 17.9% and 7.3% on IoU scores.
arXiv Detail & Related papers (2023-07-15T21:49:22Z) - Towards Unsupervised Recognition of Token-level Semantic Differences in
Related Documents [61.63208012250885]
We formulate recognizing semantic differences as a token-level regression task.
We study three unsupervised approaches that rely on a masked language model.
Our results show that an approach based on word alignment and sentence-level contrastive learning has a robust correlation to gold labels.
arXiv Detail & Related papers (2023-05-22T17:58:04Z) - An end-to-end, interactive Deep Learning based Annotation system for
cursive and print English handwritten text [0.0]
We present an innovative, complete end-to-end pipeline, that annotates offline handwritten manuscripts written in both print and cursive English.
This novel method involves an architectural combination of a detection system built upon a state-of-the-art text detection model, and a custom made Deep Learning model for the recognition system.
arXiv Detail & Related papers (2023-04-18T00:24:07Z) - Robust Text Line Detection in Historical Documents: Learning and
Evaluation Methods [1.9938405188113029]
We present a study conducted using three state-of-the-art systems Doc-UFCN, dhSegment and ARU-Net.
We show that it is possible to build generic models trained on a wide variety of historical document datasets that can correctly segment diverse unseen pages.
arXiv Detail & Related papers (2022-03-23T11:56:25Z) - DAN: a Segmentation-free Document Attention Network for Handwritten
Document Recognition [1.7875811547963403]
We propose an end-to-end segmentation-free architecture for handwritten document recognition.
The model is trained to label text parts using begin and end tags in an XML-like fashion.
We achieve competitive results on the READ dataset at page level, as well as double-page level with a CER of 3.53% and 3.69%, respectively.
arXiv Detail & Related papers (2022-03-23T08:40:42Z) - Transformer-Based Approach for Joint Handwriting and Named Entity
Recognition in Historical documents [1.7491858164568674]
This work presents the first approach that adopts the transformer networks for named entity recognition in handwritten documents.
We achieve the new state-of-the-art performance in the ICDAR 2017 Information Extraction competition using the Esposalles database.
arXiv Detail & Related papers (2021-12-08T09:26:21Z) - Multilevel Text Alignment with Cross-Document Attention [59.76351805607481]
Existing alignment methods operate at a single, predefined level.
We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component.
arXiv Detail & Related papers (2020-10-03T02:52:28Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.