Optical Text Recognition in Nepali and Bengali: A Transformer-based Approach
- URL: http://arxiv.org/abs/2404.02375v1
- Date: Wed, 3 Apr 2024 00:21:14 GMT
- Title: Optical Text Recognition in Nepali and Bengali: A Transformer-based Approach
- Authors: S M Rakib Hasan, Aakar Dhakal, Md Humaion Kabir Mehedi, Annajiat Alim Rasel,
- Abstract summary: This paper discusses text recognition for two scripts: Bengali and Nepali.
There are about 300 and 40 million Bengali and Nepali speakers respectively.
The results signify that the suggested technique corresponds with current approaches.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Efforts on the research and development of OCR systems for Low-Resource Languages are relatively new. Low-resource languages have little training data available for training Machine Translation systems or other systems. Even though a vast amount of text has been digitized and made available on the internet the text is still in PDF and Image format, which are not instantly accessible. This paper discusses text recognition for two scripts: Bengali and Nepali; there are about 300 and 40 million Bengali and Nepali speakers respectively. In this study, using encoder-decoder transformers, a model was developed, and its efficacy was assessed using a collection of optical text images, both handwritten and printed. The results signify that the suggested technique corresponds with current approaches and achieves high precision in recognizing text in Bengali and Nepali. This study can pave the way for the advanced and accessible study of linguistics in South East Asia.
Related papers
- Text Image Generation for Low-Resource Languages with Dual Translation Learning [0.0]
This study proposes a novel approach that generates text images in low-resource languages by emulating the style of real text images from high-resource languages.
The training of this model involves dual translation tasks, where it transforms plain text images into either synthetic or real text images.
To enhance the accuracy and variety of generated text images, we introduce two guidance techniques.
arXiv Detail & Related papers (2024-09-26T11:23:59Z) - The First Swahili Language Scene Text Detection and Recognition Dataset [55.83178123785643]
There is a significant gap in low-resource languages, especially the Swahili Language.
Swahili is widely spoken in East African countries but is still an under-explored language in scene text recognition.
We propose a comprehensive dataset of Swahili scene text images and evaluate the dataset on different scene text detection and recognition models.
arXiv Detail & Related papers (2024-05-19T03:55:02Z) - MENTOR: Multilingual tExt detectioN TOward leaRning by analogy [59.37382045577384]
We propose a framework to detect and identify both seen and unseen language regions inside scene images.
"MENTOR" is the first work to realize a learning strategy between zero-shot learning and few-shot learning for multilingual scene text detection.
arXiv Detail & Related papers (2024-03-12T03:35:17Z) - TEXTRON: Weakly Supervised Multilingual Text Detection through Data
Programming [21.88026116276415]
Text detection is a challenging problem in the field of computer vision (CV)
There is a scarcity of word-level labeled data for text detection, especially for multilingual settings and Indian scripts.
We propose TEXTRON, a Data Programming-based approach, where users can plug various text detection methods into a weak supervision-based learning framework.
arXiv Detail & Related papers (2024-02-15T09:18:18Z) - Towards Detecting, Recognizing, and Parsing the Address Information from
Bangla Signboard: A Deep Learning-based Approach [1.3778851745408136]
We have proposed an end-to-end system with deep learning-based models for detecting, recognizing, correcting, and parsing information from Bangla signboards.
We have created manually annotated and synthetic datasets to train signboard detection, address text detection, address text recognition, and address text models.
Finally, we have developed a Bangla address text using the state-of-the-art transformer-based pre-trained language model.
arXiv Detail & Related papers (2023-11-22T08:25:15Z) - NusaWrites: Constructing High-Quality Corpora for Underrepresented and
Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages.
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.
Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z) - Cross-modality Data Augmentation for End-to-End Sign Language Translation [66.46877279084083]
End-to-end sign language translation (SLT) aims to convert sign language videos into spoken language texts directly without intermediate representations.
It has been a challenging task due to the modality gap between sign videos and texts and the data scarcity of labeled data.
We propose a novel Cross-modality Data Augmentation (XmDA) framework to transfer the powerful gloss-to-text translation capabilities to end-to-end sign language translation.
arXiv Detail & Related papers (2023-05-18T16:34:18Z) - Bengali Handwritten Digit Recognition using CNN with Explainable AI [0.5156484100374058]
We have used various machine learning algorithms and CNN to recognize handwritten Bengali digits.
Grad-CAM was used as an XAI method on our CNN model, which gave us insights into the model.
arXiv Detail & Related papers (2022-12-23T04:40:20Z) - Language Matters: A Weakly Supervised Pre-training Approach for Scene
Text Detection and Spotting [69.77701325270047]
This paper presents a weakly supervised pre-training method that can acquire effective scene text representations.
Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features.
Experiments show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks.
arXiv Detail & Related papers (2022-03-08T08:10:45Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Deep Learning for Hindi Text Classification: A Comparison [6.8629257716723]
The research in the classification of morphologically rich and low resource Hindi language written in Devanagari script has been limited due to the absence of large labeled corpus.
In this work, we used translated versions of English data-sets to evaluate models based on CNN, LSTM and Attention.
The paper also serves as a tutorial for popular text classification techniques.
arXiv Detail & Related papers (2020-01-19T09:29:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.