AttentionHTR: Handwritten Text Recognition Based on Attention
Encoder-Decoder Networks
- URL: http://arxiv.org/abs/2201.09390v1
- Date: Sun, 23 Jan 2022 22:48:36 GMT
- Title: AttentionHTR: Handwritten Text Recognition Based on Attention
Encoder-Decoder Networks
- Authors: Dmitrijs Kass and Ekta Vats
- Abstract summary: This work proposes an attention-based sequence-to-sequence model for handwritten word recognition.
It exploits models pre-trained on scene text images as a starting point towards tailoring the handwriting recognition models.
The effectiveness of the proposed end-to-end HTR system has been empirically evaluated on a novel multi-writer dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work proposes an attention-based sequence-to-sequence model for
handwritten word recognition and explores transfer learning for data-efficient
training of HTR systems. To overcome training data scarcity, this work
leverages models pre-trained on scene text images as a starting point towards
tailoring the handwriting recognition models. ResNet feature extraction and
bidirectional LSTM-based sequence modeling stages together form an encoder. The
prediction stage consists of a decoder and a content-based attention mechanism.
The effectiveness of the proposed end-to-end HTR system has been empirically
evaluated on a novel multi-writer dataset Imgur5K and the IAM dataset. The
experimental results evaluate the performance of the HTR framework, further
supported by an in-depth analysis of the error cases. Source code and
pre-trained models are available at https://github.com/dmitrijsk/AttentionHTR.
Related papers
- Efficient Sample-Specific Encoder Perturbations [37.84914870036184]
We show that a small proxy network can be used to find a sample-by-sample perturbation of the encoder output of a frozen foundation model.
Results show consistent improvements in performance evaluated through COMET and WER respectively.
arXiv Detail & Related papers (2024-05-01T08:55:16Z) - With a Little Help from your own Past: Prototypical Memory Networks for
Image Captioning [47.96387857237473]
We devise a network which can perform attention over activations obtained while processing other training samples.
Our memory models the distribution of past keys and values through the definition of prototype vectors.
We demonstrate that our proposal can increase the performance of an encoder-decoder Transformer by 3.7 CIDEr points both when training in cross-entropy only and when fine-tuning with self-critical sequence training.
arXiv Detail & Related papers (2023-08-23T18:53:00Z) - Uncovering the Handwritten Text in the Margins: End-to-end Handwritten
Text Detection and Recognition [0.840835093659811]
This work presents an end-to-end framework for automatic detection and recognition of handwritten marginalia.
It uses data augmentation and transfer learning to overcome training data scarcity.
The effectiveness of the proposed framework has been empirically evaluated on the data from early book collections found in the Uppsala University Library in Sweden.
arXiv Detail & Related papers (2023-03-10T14:00:53Z) - ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical
Handwritten Documents [3.9688530261646653]
Keywords spotting (KWS) in historical documents is an important tool for the initial exploration of digitized collections.
We propose ST-KeyS, a masked auto-encoder model based on vision transformers where the pretraining stage is based on the mask-and-predict paradigm.
In the fine-tuning stage, the pre-trained encoder is integrated into a siamese neural network model that is fine-tuned to improve feature embedding from the input images.
arXiv Detail & Related papers (2023-03-06T13:39:41Z) - Adaptive Convolutional Dictionary Network for CT Metal Artifact
Reduction [62.691996239590125]
We propose an adaptive convolutional dictionary network (ACDNet) for metal artifact reduction.
Our ACDNet can automatically learn the prior for artifact-free CT images via training data and adaptively adjust the representation kernels for each input CT image.
Our method inherits the clear interpretability of model-based methods and maintains the powerful representation ability of learning-based methods.
arXiv Detail & Related papers (2022-05-16T06:49:36Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Automatic Feature Extraction for Heartbeat Anomaly Detection [7.054093620465401]
We focus on automatic feature extraction for raw audio heartbeat sounds, aimed at anomaly detection applications in healthcare.
We learn features with the help of an autoencoder composed by a 1D non-causal convolutional encoder and a WaveNet decoder.
arXiv Detail & Related papers (2021-02-24T13:55:24Z) - Autoencoding Variational Autoencoder [56.05008520271406]
We study the implications of this behaviour on the learned representations and also the consequences of fixing it by introducing a notion of self consistency.
We show that encoders trained with our self-consistency approach lead to representations that are robust (insensitive) to perturbations in the input introduced by adversarial attacks.
arXiv Detail & Related papers (2020-12-07T14:16:14Z) - Cross-Thought for Sentence Encoder Pre-training [89.32270059777025]
Cross-Thought is a novel approach to pre-training sequence encoder.
We train a Transformer-based sequence encoder over a large set of short sequences.
Experiments on question answering and textual entailment tasks demonstrate that our pre-trained encoder can outperform state-of-the-art encoders.
arXiv Detail & Related papers (2020-10-07T21:02:41Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.