Related papers: Handwritten Text Recognition from Crowdsourced Annotations

Handwritten Text Recognition from Crowdsourced Annotations

URL: http://arxiv.org/abs/2306.10878v1
Date: Mon, 19 Jun 2023 12:11:13 GMT
Title: Handwritten Text Recognition from Crowdsourced Annotations
Authors: Sol\`ene Tarride, Tristan Faine, M\'elodie Boillet, Harold Mouch\`ere, Christopher Kermorvant
Abstract summary: We consider different ways of training a model for handwritten text recognition when multiple imperfect or noisy transcriptions are available. Our experiments are carried out on municipal registers of the city of Belfort (France) written between 1790 and 1946.
Score: 0.1679937788852769
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we explore different ways of training a model for handwritten text recognition when multiple imperfect or noisy transcriptions are available. We consider various training configurations, such as selecting a single transcription, retaining all transcriptions, or computing an aggregated transcription from all available annotations. In addition, we evaluate the impact of quality-based data selection, where samples with low agreement are removed from the training set. Our experiments are carried out on municipal registers of the city of Belfort (France) written between 1790 and 1946. % results The results show that computing a consensus transcription or training on multiple transcriptions are good alternatives. However, selecting training samples based on the degree of agreement between annotators introduces a bias in the training data and does not improve the results. Our dataset is publicly available on Zenodo: https://zenodo.org/record/8041668.

Related papers

Curriculum Direct Preference Optimization for Diffusion and Consistency Models [110.08057135882356]
We propose a novel and enhanced version of DPO based on curriculum learning for text-to-image generation. Our approach, Curriculum DPO, is compared against state-of-the-art fine-tuning approaches on three benchmarks.
arXiv Detail & Related papers (2024-05-22T13:36:48Z)
Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition [4.67385883375784]
This paper focuses on the Automatic Speech Recognition (ASR) challenge, focusing on the Tunisian dialect. First, textual and audio data is collected and in some cases annotated. Second, we explore self-supervision, semi-supervision and few-shot code-switching approaches to push the state-of-the-art on different Tunisian test sets. Third, and given the absence of conventional spelling, we produce a human evaluation of our transcripts to avoid the noise coming from spelling in our testing references.
arXiv Detail & Related papers (2023-09-20T13:56:27Z)
Adversarial Training For Low-Resource Disfluency Correction [50.51901599433536]
We propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC) We show the benefit of our proposed technique, which crucially depends on synthetically generated disfluent data, by evaluating it for DC in three Indian languages. Our technique also performs well in removing stuttering disfluencies in ASR transcripts introduced by speech impairments.
arXiv Detail & Related papers (2023-06-10T08:58:53Z)
PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage. Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors. We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z)
On-the-Fly Aligned Data Augmentation for Sequence-to-Sequence ASR [10.261890123213622]
We propose an on-the-fly data augmentation method for automatic speech recognition (ASR) Our method, called Aligned Data Augmentation (ADA) for ASR, replaces transcribed tokens and the speech representations in an aligned manner to generate training pairs.
arXiv Detail & Related papers (2021-04-03T13:00:00Z)
Few-shot learning through contextual data augmentation [74.20290390065475]
Machine translation models need to adapt to new data to maintain their performance over time. We show that adaptation on the scale of one to five examples is possible. Our model reports better accuracy scores than a reference system trained with on average 313 parallel examples.
arXiv Detail & Related papers (2021-03-31T09:05:43Z)
TS-Net: OCR Trained to Switch Between Text Transcription Styles [0.0]
We propose to extend existing text recognition networks with a Transcription Style Block (TSB) TSB can learn from data to switch between multiple transcription styles without any explicit knowledge of transcription rules. We show that TSB is able to learn completely different transcription styles in controlled experiments on artificial data.
arXiv Detail & Related papers (2021-03-09T15:21:40Z)
Textual Supervision for Visually Grounded Spoken Language Understanding [51.93744335044475]
Visually-grounded models of spoken language understanding extract semantic information directly from speech. This is useful for low-resource languages, where transcriptions can be expensive or impossible to obtain. Recent work showed that these models can be improved if transcriptions are available at training time.
arXiv Detail & Related papers (2020-10-06T15:16:23Z)
Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language. We generate abstractive summaries of narrated instructional videos across a wide variety of topics. We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)
Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations. We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z)
Bootstrapping Weakly Supervised Segmentation-free Word Spotting through HMM-based Alignment [0.5076419064097732]
We propose an approach that utilises transcripts without bounding box annotations to train word spotting models. This is done through a training-free alignment procedure based on hidden Markov models. We believe that this will be a significant advance towards a more general use of word spotting, since digital transcription data will already exist for parts of many collections of interest.
arXiv Detail & Related papers (2020-03-24T19:41:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.