Related papers: Transformer based Urdu Handwritten Text Optical Character Reader

Related papers

Handwritten Text Recognition for Low Resource Languages [4.4322265742680305]
This paper introduces BharatOCR, a novel segmentation-free paragraph-level handwritten Hindi and Urdu text recognition.<n>We propose a ViT-Transformer Decoder-LM architecture for handwritten text recognition, where a Vision Transformer (ViT) extracts visual features, a Transformer decoder generates text sequences, and a pre-trained language model (LM) refines the output to improve accuracy, fluency, and coherence.<n>The proposed model was evaluated using our custom dataset ('Parimal Urdu') and ('Parimal Hindi'), introduced in this research work, as well as two public datasets.
arXiv Detail & Related papers (2025-12-01T07:01:52Z)
GraDeT-HTR: A Resource-Efficient Bengali Handwritten Text Recognition System utilizing Grapheme-based Tokenizer and Decoder-only Transformer [2.2550831568419456]
Despite being the sixth most spoken language in the world, handwritten text recognition systems for Bengali remain severely underdeveloped.<n>We present GraDeT-HTR, a resource-efficient Bengali handwritten text recognition system based on a Grapheme-aware Decoder-only Transformer architecture.
arXiv Detail & Related papers (2025-09-22T17:56:17Z)
WriteViT: Handwritten Text Generation with Vision Transformer [7.10052009802944]
We introduce WriteViT, a one-shot handwritten text synthesis framework that incorporates Vision Transformers (ViT)<n>WriteViT produces high-quality, style-consistent handwriting while maintaining strong recognition performance in low-resource scenarios.<n>These results highlight the promise of transformer-based designs for multilingual handwriting generation and efficient style adaptation.
arXiv Detail & Related papers (2025-05-19T15:17:53Z)
Skeleton and Font Generation Network for Zero-shot Chinese Character Generation [53.08596064763731]
We propose a novel Skeleton and Font Generation Network (SFGN) to achieve a more robust Chinese character font generation. We conduct experiments on misspelled characters, a substantial portion of which slightly differs from the common ones. Our approach visually demonstrates the efficacy of generated images and outperforms current state-of-the-art font generation methods.
arXiv Detail & Related papers (2025-01-14T12:15:49Z)
Handwriting Recognition in Historical Documents with Multimodal LLM [0.0]
Multimodal Language Models have demonstrated effectiveness in performing OCR and computer vision tasks with few shot prompting. I evaluate the accuracy of handwritten document transcriptions generated by Gemini against the current state of the art Transformer based methods.
arXiv Detail & Related papers (2024-10-31T15:32:14Z)
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text [80.24176572093512]
We introduce Visual Caption Restoration (VCR), a vision-language task that challenges models to accurately restore partially obscured texts using pixel-level hints within images. This task stems from the observation that text embedded in images is intrinsically different from common visual elements and natural language due to the need to align the modalities of vision, text, and text embedded in images.
arXiv Detail & Related papers (2024-06-10T16:58:48Z)
Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering [50.52792174648067]
This initiative seeks to bridge the gap between textual and visual comprehension. We propose a new multi-task Urdu scene text dataset comprising over 1000 natural scene images. We provide fine-grained annotations for text instances, addressing the limitations of previous datasets.
arXiv Detail & Related papers (2024-05-21T06:48:26Z)
MetaScript: Few-Shot Handwritten Chinese Content Generation via Generative Adversarial Networks [15.037121719502606]
We propose MetaScript, a novel content generation system designed to address the diminishing presence of personal handwriting styles in the digital representation of Chinese characters. Our approach harnesses the power of few-shot learning to generate Chinese characters that retain the individual's unique handwriting style and maintain the efficiency of digital typing.
arXiv Detail & Related papers (2023-12-25T17:31:19Z)
Towards Detecting, Recognizing, and Parsing the Address Information from Bangla Signboard: A Deep Learning-based Approach [1.3778851745408136]
We have proposed an end-to-end system with deep learning-based models for detecting, recognizing, correcting, and parsing information from Bangla signboards. We have created manually annotated and synthetic datasets to train signboard detection, address text detection, address text recognition, and address text models. Finally, we have developed a Bangla address text using the state-of-the-art transformer-based pre-trained language model.
arXiv Detail & Related papers (2023-11-22T08:25:15Z)
Cross-modality Data Augmentation for End-to-End Sign Language Translation [66.46877279084083]
End-to-end sign language translation (SLT) aims to convert sign language videos into spoken language texts directly without intermediate representations. It has been a challenging task due to the modality gap between sign videos and texts and the data scarcity of labeled data. We propose a novel Cross-modality Data Augmentation (XmDA) framework to transfer the powerful gloss-to-text translation capabilities to end-to-end sign language translation.
arXiv Detail & Related papers (2023-05-18T16:34:18Z)
Online Gesture Recognition using Transformer and Natural Language Processing [0.0]
Transformer architecture is shown to provide a powerful machine framework for online gestures corresponding to glyph strokes of natural language sentences. Transformer architecture is shown to provide a powerful machine framework for online gestures corresponding to glyph strokes of natural language sentences.
arXiv Detail & Related papers (2023-05-05T10:17:22Z)
Beyond Arabic: Software for Perso-Arabic Script Manipulation [67.31374614549237]
We provide a set of finite-state transducer (FST) components and corresponding utilities for manipulating the writing systems of languages that use the Perso-Arabic script. The library also provides simple FST-based romanization and transliteration.
arXiv Detail & Related papers (2023-01-26T20:37:03Z)
StoryTrans: Non-Parallel Story Author-Style Transfer with Discourse Representations and Content Enhancing [73.81778485157234]
Long texts usually involve more complicated author linguistic preferences such as discourse structures than sentences. We formulate the task of non-parallel story author-style transfer, which requires transferring an input story into a specified author style. We use an additional training objective to disentangle stylistic features from the learned discourse representation to prevent the model from degenerating to an auto-encoder.
arXiv Detail & Related papers (2022-08-29T08:47:49Z)
WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models [57.557319372969495]
Large-scale auto-regressive language models pretrained on massive text have demonstrated their impressive ability to perform new natural language tasks. Recent studies further show that such a few-shot learning ability can be extended to the text-image setting by training an encoder to encode the images into embeddings. We propose a novel speech understanding framework, WavPrompt, where we finetune a wav2vec model to generate a sequence of audio embeddings understood by the language model.
arXiv Detail & Related papers (2022-03-29T19:08:55Z)
Learning Chess Blindfolded: Evaluating Language Models on State Tracking [69.3794549747725]
We consider the task of language modeling for the game of chess. Unlike natural language, chess notations describe a simple, constrained, and deterministic domain. We find that transformer language models can learn to track pieces and predict legal moves with high accuracy when trained solely on move sequences.
arXiv Detail & Related papers (2021-02-26T01:16:23Z)
Urdu Handwritten Text Recognition Using ResNet18 [0.0]
We propose a ResNet18 model for handwritten text recognition using Urdu Nastaliq Handwritten dataset (UNHD) which contains 3,12000 words written by 500 candidates.
arXiv Detail & Related papers (2021-02-19T17:55:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.