A complete character recognition and transliteration technique for
Devanagari script
- URL: http://arxiv.org/abs/2009.13460v1
- Date: Mon, 28 Sep 2020 16:43:18 GMT
- Title: A complete character recognition and transliteration technique for
Devanagari script
- Authors: Jasmine Kaur and Vinay Kumar
- Abstract summary: We present a novel technique for automatic transliteration of Devanagari script using character recognition.
One of the first tasks performed to isolate the constituent characters is segmentation.
Devanagari characters are mapped to corresponding roman alphabets in way that resulting roman alphabets have similar pronunciation to source characters.
- Score: 12.208787849155048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transliteration involves transformation of one script to another based on
phonetic similarities between the characters of two distinctive scripts. In
this paper, we present a novel technique for automatic transliteration of
Devanagari script using character recognition. One of the first tasks performed
to isolate the constituent characters is segmentation. Line segmentation
methodology in this manuscript discusses the case of overlapping lines.
Character segmentation algorithm is designed to segment conjuncts and separate
shadow characters. Presented shadow character segmentation scheme employs
connected component method to isolate the character, keeping the constituent
characters intact. Statistical features namely different order moments like
area, variance, skewness and kurtosis along with structural features of
characters are employed in two phase recognition process. After recognition,
constituent Devanagari characters are mapped to corresponding roman alphabets
in way that resulting roman alphabets have similar pronunciation to source
characters.
Related papers
- CHIRON: Rich Character Representations in Long-Form Narratives [98.273323001781]
We propose CHIRON, a new character sheet' based representation that organizes and filters textual information about characters.
We validate CHIRON via the downstream task of masked-character prediction, where our experiments show CHIRON is better and more flexible than comparable summary-based baselines.
metrics derived from CHIRON can be used to automatically infer character-centricity in stories, and that these metrics align with human judgments.
arXiv Detail & Related papers (2024-06-14T17:23:57Z) - Structural analysis of Hindi online handwritten characters for character
recognition [0.0]
Direction properties of online strokes are used to analyze them in terms of homogeneous regions or sub-strokes.
These properties along with some geometrics are used to extract sub-units from Hindi online handwritten characters.
A method is developed to extract point stroke, clockwise curve stroke, counter-clockwise curve stroke and loop stroke segments as sub-units.
arXiv Detail & Related papers (2023-10-12T11:14:27Z) - Character Queries: A Transformer-based Approach to On-Line Handwritten
Character Segmentation [4.128716153761773]
We focus on the scenario where the transcription is known beforehand, in which case the character segmentation becomes an assignment problem.
Inspired by the $k$-means clustering algorithm, we view it from the perspective of cluster assignment and present a Transformer-based architecture.
In order to assess the quality of our approach, we create character segmentation ground truths for two popular on-line handwriting datasets.
arXiv Detail & Related papers (2023-09-06T15:19:04Z) - Siamese based Neural Network for Offline Writer Identification on word
level data [7.747239584541488]
We propose a novel scheme to identify the author of a document based on the input word image.
Our method is text independent and does not impose any constraint on the size of the input image under examination.
arXiv Detail & Related papers (2022-11-17T10:01:46Z) - Getting the Most out of Simile Recognition [48.5838790615549]
Simile recognition involves two subtasks: simile sentence classification that discriminates whether a sentence contains simile, and simile component extraction that locates the corresponding objects.
Recent work ignores features other than surface strings.
We study two types of features: 1) input-side features that include POS tags, dependency trees and word definitions, and 2) decoding features that capture the interdependence among various decoding decisions.
arXiv Detail & Related papers (2022-11-11T03:22:45Z) - CDistNet: Perceiving Multi-Domain Character Distance for Robust Text
Recognition [87.3894423816705]
We propose a novel module called Multi-Domain Character Distance Perception (MDCDP) to establish a visually and semantically related position embedding.
MDCDP uses the position embedding to query both visual and semantic features following the cross-attention mechanism.
We develop CDistNet that stacks multiple MDCDPs to guide a gradually precise distance modeling.
arXiv Detail & Related papers (2021-11-22T06:27:29Z) - I2C2W: Image-to-Character-to-Word Transformers for Accurate Scene Text
Recognition [68.95544645458882]
This paper presents I2C2W, a novel scene text recognizer that is accurate and tolerant to various noises in scenes.
I2C2W consists of an image-to-character module (I2C) and a character-to-word module (C2W) which are complementary and can be trained end-to-end.
arXiv Detail & Related papers (2021-05-18T09:20:58Z) - 2kenize: Tying Subword Sequences for Chinese Script Conversion [54.33749520569979]
We propose a model that can disambiguate between mappings and convert between the two scripts.
Our proposed method outperforms previous Chinese Character conversion approaches by 6 points in accuracy.
arXiv Detail & Related papers (2020-05-07T10:53:05Z) - Neural Computing for Online Arabic Handwriting Character Recognition
using Hard Stroke Features Mining [0.0]
An enhanced method of detecting the desired critical points from vertical and horizontal direction-length of handwriting stroke features of online Arabic script recognition is proposed.
A minimum feature set is extracted from these tokens for classification of characters using a multilayer perceptron with a back-propagation learning algorithm and modified sigmoid function-based activation function.
The proposed method achieves an average accuracy of 98.6% comparable in state of art character recognition techniques.
arXiv Detail & Related papers (2020-05-02T23:17:08Z) - A Skip-connected Multi-column Network for Isolated Handwritten Bangla
Character and Digit recognition [12.551285203114723]
We have proposed a non-explicit feature extraction method using a multi-scale multi-column skip convolutional neural network.
Our method is evaluated on four publicly available datasets of isolated handwritten Bangla characters and digits.
arXiv Detail & Related papers (2020-04-27T13:18:58Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.