Enhancing Indic Handwritten Text Recognition Using Global Semantic
Information
- URL: http://arxiv.org/abs/2212.07776v1
- Date: Thu, 15 Dec 2022 12:53:26 GMT
- Title: Enhancing Indic Handwritten Text Recognition Using Global Semantic
Information
- Authors: Ajoy Mondal and C. V. Jawahar
- Abstract summary: We use a semantic module in an encoder-decoder framework for extracting global semantic information to recognize the Indic handwritten texts.
The proposed framework achieves state-of-the-art results on handwritten texts of ten Indic languages.
- Score: 36.01828106385858
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Handwritten Text Recognition (HTR) is more interesting and challenging than
printed text due to uneven variations in the handwriting style of the writers,
content, and time. HTR becomes more challenging for the Indic languages because
of (i) multiple characters combined to form conjuncts which increase the number
of characters of respective languages, and (ii) near to 100 unique basic
Unicode characters in each Indic script. Recently, many recognition methods
based on the encoder-decoder framework have been proposed to handle such
problems. They still face many challenges, such as image blur and incomplete
characters due to varying writing styles and ink density. We argue that most
encoder-decoder methods are based on local visual features without explicit
global semantic information.
In this work, we enhance the performance of Indic handwritten text
recognizers using global semantic information. We use a semantic module in an
encoder-decoder framework for extracting global semantic information to
recognize the Indic handwritten texts. The semantic information is used in both
the encoder for supervision and the decoder for initialization. The semantic
information is predicted from the word embedding of a pre-trained language
model. Extensive experiments demonstrate that the proposed framework achieves
state-of-the-art results on handwritten texts of ten Indic languages.
Related papers
- TREND: A Whitespace Replacement Information Hiding Method [0.0]
We introduce a novel method for information hiding termed TREND.
It is able to conceal any byte-encoded sequence within a cover text.
By substituting conventional whitespace characters with visually similar Unicode whitespace characters, our proposed scheme preserves the semantics of the cover text.
arXiv Detail & Related papers (2025-02-18T10:21:27Z) - HierCode: A Lightweight Hierarchical Codebook for Zero-shot Chinese Text Recognition [47.86479271322264]
We propose HierCode, a novel and lightweight codebook that exploits the innate hierarchical nature of Chinese characters.
HierCode employs a multi-hot encoding strategy, leveraging hierarchical binary tree encoding and prototype learning to create distinctive, informative representations for each character.
This approach not only facilitates zero-shot recognition of OOV characters by utilizing shared radicals and structures but also excels in line-level recognition tasks by computing similarity with visual features.
arXiv Detail & Related papers (2024-03-20T17:20:48Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Letter-level Online Writer Identification [86.13203975836556]
We focus on a novel problem, letter-level online writer-id, which requires only a few trajectories of written letters as identification cues.
A main challenge is that a person often writes a letter in different styles from time to time.
We refer to this problem as the variance of online writing styles (Var-O-Styles)
arXiv Detail & Related papers (2021-12-06T07:21:53Z) - SmartPatch: Improving Handwritten Word Imitation with Patch
Discriminators [67.54204685189255]
We propose SmartPatch, a new technique increasing the performance of current state-of-the-art methods.
We combine the well-known patch loss with information gathered from the parallel trained handwritten text recognition system.
This leads to a more enhanced local discriminator and results in more realistic and higher-quality generated handwritten words.
arXiv Detail & Related papers (2021-05-21T18:34:21Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text
Recognition [17.191496890376197]
We propose a semantics enhanced encoder-decoder framework to robustly recognize low-quality scene texts.
The proposed framework is more robust for low-quality text images, and achieves state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2020-05-22T03:02:46Z) - Spying on your neighbors: Fine-grained probing of contextual embeddings
for information about surrounding words [12.394077144994617]
We introduce a suite of probing tasks that enable fine-grained testing of contextual embeddings for encoding of information about surrounding words.
We examine the popular BERT, ELMo and GPT contextual encoders and find that each of our tested information types is indeed encoded as contextual information across tokens.
We discuss implications of these results for how different types of models breakdown and prioritize word-level context information when constructing token embeddings.
arXiv Detail & Related papers (2020-05-04T19:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.