Handwritten Script Identification from Text Lines
- URL: http://arxiv.org/abs/2009.07433v1
- Date: Wed, 16 Sep 2020 02:43:24 GMT
- Title: Handwritten Script Identification from Text Lines
- Authors: Pawan Kumar Singh, Iman Chatterjee, Ram Sarkar, Mita Nasipuri
- Abstract summary: We propose a robust method towards identifying scripts from handwritten documents at text line-level.
The recognition is based upon features extracted using Chain Code Histogram (CCH) and Discrete Fourier Transform (DFT)
The proposed method is experimented on 800 handwritten text lines written in seven Indic scripts namely, Gujarati, Kannada, Malayalam, Oriya, Tamil, Telugu, Urdu along with Roman script.
- Score: 38.1188690493442
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In a multilingual country like India where 12 different official scripts are
in use, automatic identification of handwritten script facilitates many
important applications such as automatic transcription of multilingual
documents, searching for documents on the web/digital archives containing a
particular script and for the selection of script specific Optical Character
Recognition (OCR) system in a multilingual environment. In this paper, we
propose a robust method towards identifying scripts from the handwritten
documents at text line-level. The recognition is based upon features extracted
using Chain Code Histogram (CCH) and Discrete Fourier Transform (DFT). The
proposed method is experimented on 800 handwritten text lines written in seven
Indic scripts namely, Gujarati, Kannada, Malayalam, Oriya, Tamil, Telugu, Urdu
along with Roman script and yielded an average identification rate of 95.14%
using Support Vector Machine (SVM) classifier.
Related papers
- Multi-Modal Multi-Granularity Tokenizer for Chu Bamboo Slip Scripts [65.10991154918737]
This study focuses on the Chu bamboo slip (CBS) script used during the Spring and Autumn and Warring States period (771-256 BCE) in Ancient China.
Our tokenizer first adopts character detection to locate character boundaries, and then conducts character recognition at both the character and sub-character levels.
To support the academic community, we have also assembled the first large-scale dataset of CBSs with over 100K annotated character image scans.
arXiv Detail & Related papers (2024-09-02T07:42:55Z) - Exploring the Role of Transliteration in In-Context Learning for Low-resource Languages Written in Non-Latin Scripts [50.40191599304911]
We investigate whether transliteration is also effective in improving LLMs' performance for low-resource languages written in non-Latin scripts.
We propose three prompt templates, where the target-language text is represented in (1) its original script, (2) Latin script, or (3) both.
Our findings show that the effectiveness of transliteration varies by task type and model size.
arXiv Detail & Related papers (2024-07-02T14:51:20Z) - Script-Agnostic Language Identification [21.19710835737713]
Many modern languages, such as Konkani, Kashmiri, Punjabi etc., are synchronically written in several scripts.
We propose learning script-agnostic representations using several different experimental strategies.
We find that word-level script randomization and exposure to a language written in multiple scripts is extremely valuable for downstream script-agnostic language identification.
arXiv Detail & Related papers (2024-06-25T19:23:42Z) - MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification [19.021909090693505]
This paper provides a new database for benchmarking script identification algorithms.
The dataset consists of 1,135 documents scanned from local newspaper and handwritten letters as well as notes from different native writers.
Easy-to-go benchmarks are proposed with handcrafted and deep learning methods.
arXiv Detail & Related papers (2024-05-29T09:29:09Z) - Visual Speech Recognition for Languages with Limited Labeled Data using
Automatic Labels from Whisper [96.43501666278316]
This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple languages.
We employ a Whisper model which can conduct both language identification and audio-based speech recognition.
By comparing the performances of VSR models trained on automatic labels and the human-annotated labels, we show that we can achieve similar VSR performance to that of human-annotated labels.
arXiv Detail & Related papers (2023-09-15T16:53:01Z) - Optical Script Identification for multi-lingual Indic-script [0.0]
The aim of this article is to discuss the advancement in the techniques for script pre-processing and text recognition.
In India there are twelve prominent Indic scripts, unlike the English language, these scripts have layers of characteristics.
arXiv Detail & Related papers (2023-08-10T14:02:05Z) - DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting [112.45423990924283]
DeepSolo++ is a simple DETR-like baseline that lets a single decoder with explicit points solo for text detection, recognition, and script identification simultaneously.
Our method not only performs well in English scenes but also masters the transcription with complex font structure and a thousand-level character classes, such as Chinese.
arXiv Detail & Related papers (2023-05-31T15:44:00Z) - Beyond Arabic: Software for Perso-Arabic Script Manipulation [67.31374614549237]
We provide a set of finite-state transducer (FST) components and corresponding utilities for manipulating the writing systems of languages that use the Perso-Arabic script.
The library also provides simple FST-based romanization and transliteration.
arXiv Detail & Related papers (2023-01-26T20:37:03Z) - A New Approach for Texture based Script Identification At Block Level
using Quad Tree Decomposition [38.20489458130109]
In a country like India, where multi-script scenario is prevalent, identifying scripts beforehand becomes obligatory.
We present the significance of Gabor wavelets filters in extracting directional energy and entropy distributions for 11 official handwritten scripts.
arXiv Detail & Related papers (2020-09-16T02:50:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.