Enhancing Handwritten Text Recognition with N-gram sequence
decomposition and Multitask Learning
- URL: http://arxiv.org/abs/2012.14459v1
- Date: Mon, 28 Dec 2020 19:35:40 GMT
- Title: Enhancing Handwritten Text Recognition with N-gram sequence
decomposition and Multitask Learning
- Authors: Vasiliki Tassopoulou, George Retsinas, Petros Maragos
- Abstract summary: Current approaches in the field of Handwritten Text Recognition are predominately single task with unigram, character level target units.
In our work, we utilize a Multi-task Learning scheme, training the model to perform decompositions of the target sequence with target units of different granularity.
Our proposed model, even though evaluated only on the unigram task, outperforms its counterpart single-task by absolute 2.52% WER and 1.02% CER.
- Score: 36.69114677635806
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current state-of-the-art approaches in the field of Handwritten Text
Recognition are predominately single task with unigram, character level target
units. In our work, we utilize a Multi-task Learning scheme, training the model
to perform decompositions of the target sequence with target units of different
granularity, from fine to coarse. We consider this method as a way to utilize
n-gram information, implicitly, in the training process, while the final
recognition is performed using only the unigram output. % in order to highlight
the difference of the internal Unigram decoding of such a multi-task approach
highlights the capability of the learned internal representations, imposed by
the different n-grams at the training step. We select n-grams as our target
units and we experiment from unigrams to fourgrams, namely subword level
granularities. These multiple decompositions are learned from the network with
task-specific CTC losses. Concerning network architectures, we propose two
alternatives, namely the Hierarchical and the Block Multi-task. Overall, our
proposed model, even though evaluated only on the unigram task, outperforms its
counterpart single-task by absolute 2.52\% WER and 1.02\% CER, in the greedy
decoding, without any computational overhead during inference, hinting towards
successfully imposing an implicit language model.
Related papers
- Multi-Task Consistency for Active Learning [18.794331424921946]
Inconsistency-based active learning has proven to be effective in selecting informative samples for annotation.
We propose a novel multi-task active learning strategy for two coupled vision tasks: object detection and semantic segmentation.
Our approach achieves 95% of the fully-trained performance using only 67% of the available data.
arXiv Detail & Related papers (2023-06-21T17:34:31Z) - Neural Coreference Resolution based on Reinforcement Learning [53.73316523766183]
Coreference resolution systems need to solve two subtasks.
One task is to detect all of the potential mentions, the other is to learn the linking of an antecedent for each possible mention.
We propose a reinforcement learning actor-critic-based neural coreference resolution system.
arXiv Detail & Related papers (2022-12-18T07:36:35Z) - Improving Cross-task Generalization of Unified Table-to-text Models with
Compositional Task Configurations [63.04466647849211]
Methods typically encode task information with a simple dataset name as a prefix to the encoder.
We propose compositional task configurations, a set of prompts prepended to the encoder to improve cross-task generalization.
We show this not only allows the model to better learn shared knowledge across different tasks at training, but also allows us to control the model by composing new configurations.
arXiv Detail & Related papers (2022-12-17T02:20:14Z) - Effective Cross-Task Transfer Learning for Explainable Natural Language
Inference with T5 [50.574918785575655]
We compare sequential fine-tuning with a model for multi-task learning in the context of boosting performance on two tasks.
Our results show that while sequential multi-task learning can be tuned to be good at the first of two target tasks, it performs less well on the second and additionally struggles with overfitting.
arXiv Detail & Related papers (2022-10-31T13:26:08Z) - Word Sense Induction with Hierarchical Clustering and Mutual Information
Maximization [14.997937028599255]
Word sense induction is a difficult problem in natural language processing.
We propose a novel unsupervised method based on hierarchical clustering and invariant information clustering.
We empirically demonstrate that, in certain cases, our approach outperforms prior WSI state-of-the-art methods.
arXiv Detail & Related papers (2022-10-11T13:04:06Z) - Histogram of Oriented Gradients Meet Deep Learning: A Novel Multi-task
Deep Network for Medical Image Semantic Segmentation [18.066680957993494]
We present our novel deep multi-task learning method for medical image segmentation.
We generate the pseudo-labels of an auxiliary task in an unsupervised manner.
Our method consistently improves the performance compared to the counter-part method.
arXiv Detail & Related papers (2022-04-02T23:50:29Z) - Grad2Task: Improved Few-shot Text Classification Using Gradients for
Task Representation [24.488427641442694]
We propose a novel conditional neural process-based approach for few-shot text classification.
Our key idea is to represent each task using gradient information from a base model.
Our approach outperforms traditional fine-tuning, sequential transfer learning, and state-of-the-art meta learning approaches.
arXiv Detail & Related papers (2022-01-27T15:29:30Z) - Keyphrase Extraction with Dynamic Graph Convolutional Networks and
Diversified Inference [50.768682650658384]
Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document.
Recent Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks.
In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously.
arXiv Detail & Related papers (2020-10-24T08:11:23Z) - BURT: BERT-inspired Universal Representation from Twin Structure [89.82415322763475]
BURT (BERT inspired Universal Representation from Twin Structure) is capable of generating universal, fixed-size representations for input sequences of any granularity.
Our proposed BURT adopts the Siamese network, learning sentence-level representations from natural language inference dataset and word/phrase-level representations from paraphrasing dataset.
We evaluate BURT across different granularities of text similarity tasks, including STS tasks, SemEval2013 Task 5(a) and some commonly used word similarity tasks.
arXiv Detail & Related papers (2020-04-29T04:01:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.