Few Shots Is All You Need: A Progressive Few Shot Learning Approach for
Low Resource Handwriting Recognition
- URL: http://arxiv.org/abs/2107.10064v1
- Date: Wed, 21 Jul 2021 13:18:21 GMT
- Title: Few Shots Is All You Need: A Progressive Few Shot Learning Approach for
Low Resource Handwriting Recognition
- Authors: Mohamed Ali Souibgui, Alicia Forn\'es, Yousri Kessentini, Be\'ata
Megyesi
- Abstract summary: We propose a few-shot learning-based handwriting recognition approach that significantly reduces the human labor annotation process.
Our model detects all symbols of a given alphabet in a textline image, then a decoding step maps the symbol similarity scores to the final sequence of transcribed symbols.
Since this retraining would require annotation of thousands of handwritten symbols together with their bounding boxes, we propose to avoid such human effort through an unsupervised progressive learning approach.
- Score: 1.7491858164568674
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Handwritten text recognition in low resource scenarios, such as manuscripts
with rare alphabets, is a challenging problem. The main difficulty comes from
the very few annotated data and the limited linguistic information (e.g.
dictionaries and language models). Thus, we propose a few-shot learning-based
handwriting recognition approach that significantly reduces the human labor
annotation process, requiring only few images of each alphabet symbol. First,
our model detects all symbols of a given alphabet in a textline image, then a
decoding step maps the symbol similarity scores to the final sequence of
transcribed symbols. Our model is first pretrained on synthetic line images
generated from any alphabet, even though different from the target domain. A
second training step is then applied to diminish the gap between the source and
target data. Since this retraining would require annotation of thousands of
handwritten symbols together with their bounding boxes, we propose to avoid
such human effort through an unsupervised progressive learning approach that
automatically assigns pseudo-labels to the non-annotated data. The evaluation
on different manuscript datasets show that our model can lead to competitive
results with a significant reduction in human effort.
Related papers
- Sign Stitching: A Novel Approach to Sign Language Production [35.35777909051466]
We propose using dictionary examples to create expressive sign language sequences.
We present a 7-step approach to effectively stitch the signs together.
We leverage the SignGAN model to map the output to a photo-realistic signer.
arXiv Detail & Related papers (2024-05-13T11:44:57Z) - Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation [82.5217996570387]
We adapt a pre-trained language model for auto-regressive text-to-image generation.
We find that pre-trained language models offer limited help.
arXiv Detail & Related papers (2023-11-27T07:19:26Z) - CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes [93.71909293023663]
Cross-modality Aligned Prototypes (CAPro) is a unified contrastive learning framework to learn visual representations with correct semantics.
CAPro achieves new state-of-the-art performance and exhibits robustness to open-set recognition.
arXiv Detail & Related papers (2023-10-15T07:20:22Z) - Offline Detection of Misspelled Handwritten Words by Convolving
Recognition Model Features with Text Labels [0.0]
We introduce the task of comparing a handwriting image to text.
Our model's classification head is trained entirely on synthetic data created using a state-of-the-art generative adversarial network.
Such massive performance gains can lead to significant productivity increases in applications utilizing human-in-the-loop automation.
arXiv Detail & Related papers (2023-09-18T21:13:42Z) - Discriminative Class Tokens for Text-to-Image Diffusion Models [107.98436819341592]
We propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text.
Our method is fast compared to prior fine-tuning methods and does not require a collection of in-class images.
We evaluate our method extensively, showing that the generated images are: (i) more accurate and of higher quality than standard diffusion models, (ii) can be used to augment training data in a low-resource setting, and (iii) reveal information about the data used to train the guiding classifier.
arXiv Detail & Related papers (2023-03-30T05:25:20Z) - Language Quantized AutoEncoders: Towards Unsupervised Text-Image
Alignment [81.73717488887938]
Language-Quantized AutoEncoder (LQAE) learns to align text-image data in an unsupervised manner by leveraging pretrained language models.
LQAE learns to represent similar images with similar clusters of text tokens, thereby aligning these two modalities without the use of aligned text-image pairs.
This enables few-shot image classification with large language models (e.g., GPT-3) as well as linear classification of images based on BERT text features.
arXiv Detail & Related papers (2023-02-02T06:38:44Z) - A Few Shot Multi-Representation Approach for N-gram Spotting in
Historical Manuscripts [1.2930503923129213]
We propose a few-shot learning paradigm for spotting sequences of a few characters (N-gram)
We exhibit that recognition of important n-grams could reduce the system's dependency on vocabulary.
arXiv Detail & Related papers (2022-09-21T15:35:02Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z) - One-shot Compositional Data Generation for Low Resource Handwritten Text
Recognition [10.473427493876422]
Low resource Handwritten Text Recognition is a hard problem due to the scarce annotated data and the very limited linguistic information.
In this paper we address this problem through a data generation technique based on Bayesian Program Learning.
Contrary to traditional generation approaches, which require a huge amount of annotated images, our method is able to generate human-like handwriting using only one sample of each symbol from the desired alphabet.
arXiv Detail & Related papers (2021-05-11T18:53:01Z) - A Few-shot Learning Approach for Historical Ciphered Manuscript
Recognition [3.0682439731292592]
We propose a novel method for handwritten ciphers recognition based on few-shot object detection.
By training on synthetic data, we show that the proposed architecture is able to recognize handwritten ciphers with unseen alphabets.
arXiv Detail & Related papers (2020-09-26T11:49:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.