Whole page recognition of historical handwriting
- URL: http://arxiv.org/abs/2009.10634v1
- Date: Tue, 22 Sep 2020 15:46:33 GMT
- Title: Whole page recognition of historical handwriting
- Authors: Hans J.G.A. Dolfing
- Abstract summary: We investigate an end-to-end inference approach without text localization which takes a handwritten page and transcribes its full text.
No explicit character, word or line segmentation is involved in inference which is why we call this approach "segmentation free"
We conclude that a whole page inference approach without text localization and segmentation is competitive.
- Score: 1.2183405753834562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Historical handwritten documents guard an important part of human knowledge
only within reach of a few scholars and experts. Recent developments in machine
learning and handwriting research have the potential of rendering this
information accessible and searchable to a larger audience. To this end, we
investigate an end-to-end inference approach without text localization which
takes a handwritten page and transcribes its full text. No explicit character,
word or line segmentation is involved in inference which is why we call this
approach "segmentation free". We explore its robustness and accuracy compared
to a line-by-line segmented approach based on the IAM, RODRIGO and ScribbleLens
corpora, in three languages with handwriting styles spanning 400 years. We
concentrate on model types and sizes which can be deployed on a hand-held or
embedded device. We conclude that a whole page inference approach without text
localization and segmentation is competitive.
Related papers
- An end-to-end, interactive Deep Learning based Annotation system for
cursive and print English handwritten text [0.0]
We present an innovative, complete end-to-end pipeline, that annotates offline handwritten manuscripts written in both print and cursive English.
This novel method involves an architectural combination of a detection system built upon a state-of-the-art text detection model, and a custom made Deep Learning model for the recognition system.
arXiv Detail & Related papers (2023-04-18T00:24:07Z) - The Learnable Typewriter: A Generative Approach to Text Analysis [17.355857281085164]
We present a generative document-specific approach to character analysis and recognition in text lines.
Taking as input a set of text lines with similar font or handwriting, our approach can learn a large number of different characters.
arXiv Detail & Related papers (2023-02-03T11:17:59Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Towards End-to-end Handwritten Document Recognition [0.0]
Handwritten text recognition has been widely studied in the last decades for its numerous applications.
In this thesis, we propose to tackle these issues by performing the handwritten text recognition of whole document in an end-to-end way.
We reached state-of-the-art results at paragraph level on the RIMES 2011, IAM and READ 2016 datasets and outperformed the line-level state of the art on these datasets.
arXiv Detail & Related papers (2022-09-30T10:31:22Z) - Robust Text Line Detection in Historical Documents: Learning and
Evaluation Methods [1.9938405188113029]
We present a study conducted using three state-of-the-art systems Doc-UFCN, dhSegment and ARU-Net.
We show that it is possible to build generic models trained on a wide variety of historical document datasets that can correctly segment diverse unseen pages.
arXiv Detail & Related papers (2022-03-23T11:56:25Z) - Digital Editions as Distant Supervision for Layout Analysis of Printed
Books [76.29918490722902]
We describe methods for exploiting this semantic markup as distant supervision for training and evaluating layout analysis models.
In experiments with several model architectures on the half-million pages of the Deutsches Textarchiv (DTA), we find a high correlation of these region-level evaluation methods with pixel-level and word-level metrics.
We discuss the possibilities for improving accuracy with self-training and the ability of models trained on the DTA to generalize to other historical printed books.
arXiv Detail & Related papers (2021-12-23T16:51:53Z) - Rethinking Text Line Recognition Models [57.47147190119394]
We consider two decoder families (Connectionist Temporal Classification and Transformer) and three encoder modules (Bidirectional LSTMs, Self-Attention, and GRCLs)
We compare their accuracy and performance on widely used public datasets of scene and handwritten text.
Unlike the more common Transformer-based models, this architecture can handle inputs of arbitrary length.
arXiv Detail & Related papers (2021-04-15T21:43:13Z) - OrigamiNet: Weakly-Supervised, Segmentation-Free, One-Step, Full Page
Text Recognition by learning to unfold [6.09170287691728]
We take a step from segmentation-free single line recognition towards segmentation-free multi-line / full page recognition.
We propose a novel and simple neural network module, termed textbfOrigamiNet, that can augment any CTC-trained, fully convolutional single line text recognizer.
We achieve state-of-the-art character error rate on both IAM & ICDAR 2017 HTR benchmarks for handwriting recognition, surpassing all other methods in the literature.
arXiv Detail & Related papers (2020-06-12T22:18:02Z) - Enabling Language Models to Fill in the Blanks [81.59381915581892]
We present a simple approach for text infilling, the task of predicting missing spans of text at any position in a document.
We train (or fine-tune) off-the-shelf language models on sequences containing the concatenation of artificially-masked text and the text which was masked.
We show that this approach, which we call infilling by language modeling, can enable LMs to infill entire sentences effectively on three different domains: short stories, scientific abstracts, and lyrics.
arXiv Detail & Related papers (2020-05-11T18:00:03Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.