Related papers: GraDeT-HTR: A Resource-Efficient Bengali Handwritten Text Recognition System utilizing Grapheme-based Tokenizer and Decoder-only Transformer

GraDeT-HTR: A Resource-Efficient Bengali Handwritten Text Recognition System utilizing Grapheme-based Tokenizer and Decoder-only Transformer

URL: http://arxiv.org/abs/2509.18081v1
Date: Mon, 22 Sep 2025 17:56:17 GMT
Title: GraDeT-HTR: A Resource-Efficient Bengali Handwritten Text Recognition System utilizing Grapheme-based Tokenizer and Decoder-only Transformer
Authors: Md. Mahmudul Hasan, Ahmed Nesar Tahsin Choudhury, Mahmudul Hasan, Md. Mosaddek Khan,
Abstract summary: Despite being the sixth most spoken language in the world, handwritten text recognition systems for Bengali remain severely underdeveloped.<n>We present GraDeT-HTR, a resource-efficient Bengali handwritten text recognition system based on a Grapheme-aware Decoder-only Transformer architecture.
Score: 2.2550831568419456
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Despite Bengali being the sixth most spoken language in the world, handwritten text recognition (HTR) systems for Bengali remain severely underdeveloped. The complexity of Bengali script--featuring conjuncts, diacritics, and highly variable handwriting styles--combined with a scarcity of annotated datasets makes this task particularly challenging. We present GraDeT-HTR, a resource-efficient Bengali handwritten text recognition system based on a Grapheme-aware Decoder-only Transformer architecture. To address the unique challenges of Bengali script, we augment the performance of a decoder-only transformer by integrating a grapheme-based tokenizer and demonstrate that it significantly improves recognition accuracy compared to conventional subword tokenizers. Our model is pretrained on large-scale synthetic data and fine-tuned on real human-annotated samples, achieving state-of-the-art performance on multiple benchmark datasets.

Related papers

BeHGAN: Bengali Handwritten Word Generation from Plain Text Using Generative Adversarial Networks [0.2446672595462589]
We develop and use a self-collected dataset of Bengali handwriting samples.<n>The dataset includes contributions from approximately five hundred individuals across different ages and genders.<n>Our approach demonstrates the ability to produce diverse handwritten outputs from input plain text.
arXiv Detail & Related papers (2025-12-25T14:38:12Z)
Handwritten Text Recognition for Low Resource Languages [4.4322265742680305]
This paper introduces BharatOCR, a novel segmentation-free paragraph-level handwritten Hindi and Urdu text recognition.<n>We propose a ViT-Transformer Decoder-LM architecture for handwritten text recognition, where a Vision Transformer (ViT) extracts visual features, a Transformer decoder generates text sequences, and a pre-trained language model (LM) refines the output to improve accuracy, fluency, and coherence.<n>The proposed model was evaluated using our custom dataset ('Parimal Urdu') and ('Parimal Hindi'), introduced in this research work, as well as two public datasets.
arXiv Detail & Related papers (2025-12-01T07:01:52Z)
Learning Robust Named Entity Recognizers From Noisy Data With Retrieval Augmentation [67.89838237013078]
Named entity recognition (NER) models often struggle with noisy inputs. We propose a more realistic setting in which only noisy text and its NER labels are available. We employ a multi-view training framework that improves robust NER without retrieving text during inference.
arXiv Detail & Related papers (2024-07-26T07:30:41Z)
Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models. We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning. Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z)
Online Gesture Recognition using Transformer and Natural Language Processing [0.0]
Transformer architecture is shown to provide a powerful machine framework for online gestures corresponding to glyph strokes of natural language sentences. Transformer architecture is shown to provide a powerful machine framework for online gestures corresponding to glyph strokes of natural language sentences.
arXiv Detail & Related papers (2023-05-05T10:17:22Z)
Code-Switching Text Generation and Injection in Mandarin-English ASR [57.57570417273262]
We investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transducer (T-T) We first propose a strategy to generate code-switching text data and then investigate injecting generated text into T-T model explicitly by Text-To-Speech (TTS) conversion or implicitly by tying speech and text latent spaces. Experimental results on the T-T model trained with a dataset containing 1,800 hours of real Mandarin-English code-switched speech show that our approaches to inject generated code-switching text significantly boost the performance of T-T models.
arXiv Detail & Related papers (2023-03-20T09:13:27Z)
Enhancing Indic Handwritten Text Recognition Using Global Semantic Information [36.01828106385858]
We use a semantic module in an encoder-decoder framework for extracting global semantic information to recognize the Indic handwritten texts. The proposed framework achieves state-of-the-art results on handwritten texts of ten Indic languages.
arXiv Detail & Related papers (2022-12-15T12:53:26Z)
Transformer based Urdu Handwritten Text Optical Character Reader [0.0]
Urdu language script is very difficult because of its cursive nature and change of shape of characters based on it's relative position. A need arises to propose a model which can understand complex features and generalize it for every kind of handwriting style. In this work, we propose a transformer based Urdu Handwritten text extraction model.
arXiv Detail & Related papers (2022-06-09T15:43:35Z)
Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents. Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages. We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z)
SmartPatch: Improving Handwritten Word Imitation with Patch Discriminators [67.54204685189255]
We propose SmartPatch, a new technique increasing the performance of current state-of-the-art methods. We combine the well-known patch loss with information gathered from the parallel trained handwritten text recognition system. This leads to a more enhanced local discriminator and results in more realistic and higher-quality generated handwritten words.
arXiv Detail & Related papers (2021-05-21T18:34:21Z)
One-shot Compositional Data Generation for Low Resource Handwritten Text Recognition [10.473427493876422]
Low resource Handwritten Text Recognition is a hard problem due to the scarce annotated data and the very limited linguistic information. In this paper we address this problem through a data generation technique based on Bayesian Program Learning. Contrary to traditional generation approaches, which require a huge amount of annotated images, our method is able to generate human-like handwriting using only one sample of each symbol from the desired alphabet.
arXiv Detail & Related papers (2021-05-11T18:53:01Z)
Text Compression-aided Transformer Encoding [77.16960983003271]
We propose explicit and implicit text compression approaches to enhance the Transformer encoding. backbone information, meaning the gist of the input text, is not specifically focused on. Our evaluation on benchmark datasets shows that the proposed explicit and implicit text compression approaches improve results in comparison to strong baselines.
arXiv Detail & Related papers (2021-02-11T11:28:39Z)
A Large Multi-Target Dataset of Common Bengali Handwritten Graphemes [1.009810782568186]
We propose a labeling scheme that makes segmentation in-side alpha-syllabary words linear. The dataset contains 411k curated samples of 1295 unique commonly used Bengali graphemes. The dataset is open-sourced as a part of a public Handwritten Grapheme Classification Challenge on Kaggle.
arXiv Detail & Related papers (2020-10-01T01:51:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.