Handwritten Text Recognition from Crowdsourced Annotations
- URL: http://arxiv.org/abs/2306.10878v1
- Date: Mon, 19 Jun 2023 12:11:13 GMT
- Title: Handwritten Text Recognition from Crowdsourced Annotations
- Authors: Sol\`ene Tarride, Tristan Faine, M\'elodie Boillet, Harold Mouch\`ere,
Christopher Kermorvant
- Abstract summary: We consider different ways of training a model for handwritten text recognition when multiple imperfect or noisy transcriptions are available.
Our experiments are carried out on municipal registers of the city of Belfort (France) written between 1790 and 1946.
- Score: 0.1679937788852769
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we explore different ways of training a model for handwritten
text recognition when multiple imperfect or noisy transcriptions are available.
We consider various training configurations, such as selecting a single
transcription, retaining all transcriptions, or computing an aggregated
transcription from all available annotations. In addition, we evaluate the
impact of quality-based data selection, where samples with low agreement are
removed from the training set. Our experiments are carried out on municipal
registers of the city of Belfort (France) written between 1790 and 1946. %
results The results show that computing a consensus transcription or training
on multiple transcriptions are good alternatives. However, selecting training
samples based on the degree of agreement between annotators introduces a bias
in the training data and does not improve the results. Our dataset is publicly
available on Zenodo: https://zenodo.org/record/8041668.
Related papers
- Curriculum Direct Preference Optimization for Diffusion and Consistency Models [110.08057135882356]
We propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation.
Our approach, Curriculum DPO, is compared against state-of-the-art fine-tuning approaches on three benchmarks.
arXiv Detail & Related papers (2024-05-22T13:36:48Z) - Leveraging Data Collection and Unsupervised Learning for Code-switched
Tunisian Arabic Automatic Speech Recognition [4.67385883375784]
This paper focuses on the Automatic Speech Recognition (ASR) challenge, focusing on the Tunisian dialect.
First, textual and audio data is collected and in some cases annotated.
Second, we explore self-supervision, semi-supervision and few-shot code-switching approaches to push the state-of-the-art on different Tunisian test sets.
Third, and given the absence of conventional spelling, we produce a human evaluation of our transcripts to avoid the noise coming from spelling in our testing references.
arXiv Detail & Related papers (2023-09-20T13:56:27Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR [10.261890123213622]
We propose an on-the-fly data augmentation method for automatic speech recognition (ASR)
Our method, called Aligned Data Augmentation (ADA) for ASR, replaces transcribed tokens and the speech representations in an aligned manner to generate training pairs.
arXiv Detail & Related papers (2021-04-03T13:00:00Z) - Few-shot learning through contextual data augmentation [74.20290390065475]
Machine translation models need to adapt to new data to maintain their performance over time.
We show that adaptation on the scale of one to five examples is possible.
Our model reports better accuracy scores than a reference system trained with on average 313 parallel examples.
arXiv Detail & Related papers (2021-03-31T09:05:43Z) - TS-Net: OCR Trained to Switch Between Text Transcription Styles [0.0]
We propose to extend existing text recognition networks with a Transcription Style Block (TSB)
TSB can learn from data to switch between multiple transcription styles without any explicit knowledge of transcription rules.
We show that TSB is able to learn completely different transcription styles in controlled experiments on artificial data.
arXiv Detail & Related papers (2021-03-09T15:21:40Z) - Textual Supervision for Visually Grounded Spoken Language Understanding [51.93744335044475]
Visually-grounded models of spoken language understanding extract semantic information directly from speech.
This is useful for low-resource languages, where transcriptions can be expensive or impossible to obtain.
Recent work showed that these models can be improved if transcriptions are available at training time.
arXiv Detail & Related papers (2020-10-06T15:16:23Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z) - Bootstrapping Weakly Supervised Segmentation-free Word Spotting through
HMM-based Alignment [0.5076419064097732]
We propose an approach that utilises transcripts without bounding box annotations to train word spotting models.
This is done through a training-free alignment procedure based on hidden Markov models.
We believe that this will be a significant advance towards a more general use of word spotting, since digital transcription data will already exist for parts of many collections of interest.
arXiv Detail & Related papers (2020-03-24T19:41:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.