Large Scale Font Independent Urdu Text Recognition System
- URL: http://arxiv.org/abs/2005.06752v1
- Date: Thu, 14 May 2020 06:57:24 GMT
- Title: Large Scale Font Independent Urdu Text Recognition System
- Authors: Atique Ur Rehman, Sibt Ul Hussain
- Abstract summary: There exists no automated system that can reliably recognize printed Urdu text in images and videos across different fonts.
We have developed Qaida, a large scale data set with 256 fonts, and a complete Urdu lexicon.
We have also developed a Convolutional Neural Network (CNN) based classification model which can recognize Urdu ligatures with 84.2% accuracy.
- Score: 1.5229257192293197
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: OCR algorithms have received a significant improvement in performance
recently, mainly due to the increase in the capabilities of artificial
intelligence algorithms. However, this advancement is not evenly distributed
over all languages. Urdu is among the languages which did not receive much
attention, especially in the font independent perspective. There exists no
automated system that can reliably recognize printed Urdu text in images and
videos across different fonts. To help bridge this gap, we have developed
Qaida, a large scale data set with 256 fonts, and a complete Urdu lexicon. We
have also developed a Convolutional Neural Network (CNN) based classification
model which can recognize Urdu ligatures with 84.2% accuracy. Moreover, we
demonstrate that our recognition network can not only recognize the text in the
fonts it is trained on but can also reliably recognize text in unseen (new)
fonts. To this end, this paper makes following contributions: (i) we introduce
a large scale, multiple fonts based data set for printed Urdu text
recognition;(ii) we have designed, trained and evaluated a CNN based model for
Urdu text recognition; (iii) we experiment with incremental learning methods to
produce state-of-the-art results for Urdu text recognition. All the experiment
choices were thoroughly validated via detailed empirical analysis. We believe
that this study can serve as the basis for further improvement in the
performance of font independent Urdu OCR systems.
Related papers
- A Permuted Autoregressive Approach to Word-Level Recognition for Urdu Digital Text [2.2012643583422347]
This research paper introduces a novel word-level Optical Character Recognition (OCR) model specifically designed for digital Urdu text.
The model employs a permuted autoregressive sequence (PARSeq) architecture, which enhances its performance.
The model demonstrates a high level of accuracy in capturing the intricacies of Urdu script, achieving a CER of 0.178.
arXiv Detail & Related papers (2024-08-27T14:58:13Z) - Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering [50.52792174648067]
This initiative seeks to bridge the gap between textual and visual comprehension.
We propose a new multi-task Urdu scene text dataset comprising over 1000 natural scene images.
We provide fine-grained annotations for text instances, addressing the limitations of previous datasets.
arXiv Detail & Related papers (2024-05-21T06:48:26Z) - Optical Text Recognition in Nepali and Bengali: A Transformer-based Approach [0.0]
This paper discusses text recognition for two scripts: Bengali and Nepali.
There are about 300 and 40 million Bengali and Nepali speakers respectively.
The results signify that the suggested technique corresponds with current approaches.
arXiv Detail & Related papers (2024-04-03T00:21:14Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - NusaWrites: Constructing High-Quality Corpora for Underrepresented and
Extremely Low-Resource Languages [54.808217147579036]
We conduct a case study on Indonesian local languages.
We compare the effectiveness of online scraping, human translation, and paragraph writing by native speakers in constructing datasets.
Our findings demonstrate that datasets generated through paragraph writing by native speakers exhibit superior quality in terms of lexical diversity and cultural content.
arXiv Detail & Related papers (2023-09-19T14:42:33Z) - Cross-modality Data Augmentation for End-to-End Sign Language Translation [66.46877279084083]
End-to-end sign language translation (SLT) aims to convert sign language videos into spoken language texts directly without intermediate representations.
It has been a challenging task due to the modality gap between sign videos and texts and the data scarcity of labeled data.
We propose a novel Cross-modality Data Augmentation (XmDA) framework to transfer the powerful gloss-to-text translation capabilities to end-to-end sign language translation.
arXiv Detail & Related papers (2023-05-18T16:34:18Z) - Kurdish Handwritten Character Recognition using Deep Learning Techniques [26.23274417985375]
This paper attempts to design and develop a model that can recognize handwritten characters for Kurdish alphabets using deep learning techniques.
A comprehensive dataset was created for handwritten Kurdish characters, which contains more than 40 thousand images.
The tested results reported a 96% accuracy rate, and training accuracy reported a 97% accuracy rate.
arXiv Detail & Related papers (2022-10-18T16:48:28Z) - Towards Boosting the Accuracy of Non-Latin Scene Text Recognition [27.609596088151644]
Scene-text recognition is remarkably better in Latin languages than the non-Latin languages.
This paper examines the possible reasons for low accuracy by comparing English datasets with non-Latin languages.
arXiv Detail & Related papers (2022-01-10T06:36:43Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Robust End-to-End Offline Chinese Handwriting Text Page Spotter with
Text Kernel [4.028854207195064]
We propose a robust end-to-end Chinese text page spotter framework.
It unifies text detection and text recognition with text kernel.
Our method achieves state-of-the-art results on the CASIA-HWDB2.0-2.2 dataset and ICDAR-2013 competition dataset.
arXiv Detail & Related papers (2021-07-04T05:42:04Z) - Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.
Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it.
In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.