A Few Shot Multi-Representation Approach for N-gram Spotting in
Historical Manuscripts
- URL: http://arxiv.org/abs/2209.10441v1
- Date: Wed, 21 Sep 2022 15:35:02 GMT
- Title: A Few Shot Multi-Representation Approach for N-gram Spotting in
Historical Manuscripts
- Authors: Giuseppe De Gregorio, Sanket Biswas, Mohamed Ali Souibgui, Asma
Bensalah, Josep Llad\'os, Alicia Forn\'es, Angelo Marcelli
- Abstract summary: We propose a few-shot learning paradigm for spotting sequences of a few characters (N-gram)
We exhibit that recognition of important n-grams could reduce the system's dependency on vocabulary.
- Score: 1.2930503923129213
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Despite recent advances in automatic text recognition, the performance
remains moderate when it comes to historical manuscripts. This is mainly
because of the scarcity of available labelled data to train the data-hungry
Handwritten Text Recognition (HTR) models. The Keyword Spotting System (KWS)
provides a valid alternative to HTR due to the reduction in error rate, but it
is usually limited to a closed reference vocabulary. In this paper, we propose
a few-shot learning paradigm for spotting sequences of a few characters
(N-gram) that requires a small amount of labelled training data. We exhibit
that recognition of important n-grams could reduce the system's dependency on
vocabulary. In this case, an out-of-vocabulary (OOV) word in an input
handwritten line image could be a sequence of n-grams that belong to the
lexicon. An extensive experimental evaluation of our proposed
multi-representation approach was carried out on a subset of Bentham's
historical manuscript collections to obtain some really promising results in
this direction.
Related papers
- Unsupervised Speech Recognition with N-Skipgram and Positional Unigram
Matching [67.98016412551245]
We introduce a novel ASR system, ESPUM.
This system harnesses the power of lower-order N-skipgrams (up to N=3) combined with positional unigram statistics gathered from a small batch of samples.
Our model showcases competitive performance in ASR and phoneme segmentation tasks.
arXiv Detail & Related papers (2023-10-03T19:05:32Z) - HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - Offline Detection of Misspelled Handwritten Words by Convolving
Recognition Model Features with Text Labels [0.0]
We introduce the task of comparing a handwriting image to text.
Our model's classification head is trained entirely on synthetic data created using a state-of-the-art generative adversarial network.
Such massive performance gains can lead to significant productivity increases in applications utilizing human-in-the-loop automation.
arXiv Detail & Related papers (2023-09-18T21:13:42Z) - Uncovering the Handwritten Text in the Margins: End-to-end Handwritten
Text Detection and Recognition [0.840835093659811]
This work presents an end-to-end framework for automatic detection and recognition of handwritten marginalia.
It uses data augmentation and transfer learning to overcome training data scarcity.
The effectiveness of the proposed framework has been empirically evaluated on the data from early book collections found in the Uppsala University Library in Sweden.
arXiv Detail & Related papers (2023-03-10T14:00:53Z) - Recognizing Handwriting Styles in a Historical Scanned Document Using
Unsupervised Fuzzy Clustering [0.0]
Unique handwriting styles may be dissimilar in a blend of several factors including character size, stroke width, loops, ductus, slant angles, and cursive ligatures.
Previous work on labeled data with Hidden Markov models, support vector machines, and semi-supervised recurrent neural networks have provided moderate to high success.
In this study, we successfully detect hand shifts in a historical manuscript through fuzzy soft clustering in combination with linear principal component analysis.
arXiv Detail & Related papers (2022-10-30T09:07:51Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Towards Document-Level Paraphrase Generation with Sentence Rewriting and
Reordering [88.08581016329398]
We propose CoRPG (Coherence Relationship guided Paraphrase Generation) for document-level paraphrase generation.
We use graph GRU to encode the coherence relationship graph and get the coherence-aware representation for each sentence.
Our model can generate document paraphrase with more diversity and semantic preservation.
arXiv Detail & Related papers (2021-09-15T05:53:40Z) - One-shot Compositional Data Generation for Low Resource Handwritten Text
Recognition [10.473427493876422]
Low resource Handwritten Text Recognition is a hard problem due to the scarce annotated data and the very limited linguistic information.
In this paper we address this problem through a data generation technique based on Bayesian Program Learning.
Contrary to traditional generation approaches, which require a huge amount of annotated images, our method is able to generate human-like handwriting using only one sample of each symbol from the desired alphabet.
arXiv Detail & Related papers (2021-05-11T18:53:01Z) - A Token-level Reference-free Hallucination Detection Benchmark for
Free-form Text Generation [50.55448707570669]
We propose a novel token-level, reference-free hallucination detection task and an associated annotated dataset named HaDes.
To create this dataset, we first perturb a large number of text segments extracted from English language Wikipedia, and then verify these with crowd-sourced annotations.
arXiv Detail & Related papers (2021-04-18T04:09:48Z) - Controlling Hallucinations at Word Level in Data-to-Text Generation [10.59137381324694]
State-of-art neural models include misleading statements in their outputs.
We propose a Multi-Branch Decoder which is able to leverage word-level labels to learn the relevant parts of each training instance.
Our model is able to reduce and control hallucinations, while keeping fluency and coherence in generated texts.
arXiv Detail & Related papers (2021-02-04T18:58:28Z) - Blind Face Restoration via Deep Multi-scale Component Dictionaries [75.02640809505277]
We propose a deep face dictionary network (termed as DFDNet) to guide the restoration process of degraded observations.
DFDNet generates deep dictionaries for perceptually significant face components from high-quality images.
component AdaIN is leveraged to eliminate the style diversity between the input and dictionary features.
arXiv Detail & Related papers (2020-08-02T07:02:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.