MatriVasha: A Multipurpose Comprehensive Database for Bangla Handwritten
Compound Characters
- URL: http://arxiv.org/abs/2005.02155v2
- Date: Wed, 6 May 2020 07:59:45 GMT
- Title: MatriVasha: A Multipurpose Comprehensive Database for Bangla Handwritten
Compound Characters
- Authors: Jannatul Ferdous, Suvrajit Karmaker, A K M Shahariar Azad Rabby, Syed
Akhter Hossain
- Abstract summary: MatrriVasha is the project which can recognize Bangla, handwritten several compound characters.
The proposed dataset is so far the most extensive dataset for Bangla compound characters.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: At present, recognition of the Bangla handwriting compound character has been
an essential issue for many years. In recent years there have been
application-based researches in machine learning, and deep learning, which is
gained interest, and most notably is handwriting recognition because it has a
tremendous application such as Bangla OCR. MatrriVasha, the project which can
recognize Bangla, handwritten several compound characters. Currently, compound
character recognition is an important topic due to its variant application, and
helps to create old forms, and information digitization with reliability. But
unfortunately, there is a lack of a comprehensive dataset that can categorize
all types of Bangla compound characters. MatrriVasha is an attempt to align
compound character, and it's challenging because each person has a unique style
of writing shapes. After all, MatrriVasha has proposed a dataset that intends
to recognize Bangla 120(one hundred twenty) compound characters that consist of
2552(two thousand five hundred fifty-two) isolated handwritten characters
written unique writers which were collected from within Bangladesh. This
dataset faced problems in terms of the district, age, and gender-based written
related research because the samples were collected that includes a verity of
the district, age group, and the equal number of males, and females. As of now,
our proposed dataset is so far the most extensive dataset for Bangla compound
characters. It is intended to frame the acknowledgment technique for
handwritten Bangla compound character. In the future, this dataset will be made
publicly available to help to widen the research.
Related papers
- Bukva: Russian Sign Language Alphabet [75.42794328290088]
This paper investigates the recognition of the Russian fingerspelling alphabet, also known as the Russian Sign Language (RSL) dactyl.
Dactyl is a component of sign languages where distinct hand movements represent individual letters of a written language.
We provide Bukva, the first full-fledged open-source video dataset for RSL dactyl recognition.
arXiv Detail & Related papers (2024-10-11T09:59:48Z) - Tails Tell Tales: Chapter-Wide Manga Transcriptions with Character Names [53.24414727354768]
This paper aims to generate a dialogue transcript of a complete manga chapter, entirely automatically.
It involves identifying (i) what is being said, detecting the texts on each page and classifying them into essential vs non-essential.
It also ensures the same characters are named consistently throughout the chapter.
arXiv Detail & Related papers (2024-08-01T05:47:04Z) - BanglaNet: Bangla Handwritten Character Recognition using Ensembling of
Convolutional Neural Network [0.0]
This paper presents a classification model based on the ensembling of several Convolutional Neural Networks (CNN)
Three different models based on the idea of state-of-the-art CNN models like Inception, ResNet, and DenseNet have been trained with both augmented and non-augmented inputs.
Rigorous experimentation on three benchmark Bangla handwritten characters datasets, namely, CMATERdb, BanglaLekha-Isolated, and Ekush has exhibited significant recognition accuracies.
arXiv Detail & Related papers (2024-01-16T01:08:19Z) - Hi Sheldon! Creating Deep Personalized Characters from TV Shows [52.8086853239762]
We propose a novel task, named Deep Personalized Character Creation (DPCC), creating multimodal chat personalized characters from multimodal data such as TV shows.
Given a single- or multi-modality input (text, audio, video), the goal of DPCC is to generate a multi-modality (text, audio, video) response.
To support this novel task, we further collect a character centric multimodal dialogue dataset, named Deep Personalized Character dataset (DPCD), from TV shows.
DPCD contains character-specific multimodal dialogue data of 10k utterances and 6 hours of audio/
arXiv Detail & Related papers (2023-04-09T00:39:43Z) - Comprehensive Benchmark Datasets for Amharic Scene Text Detection and
Recognition [56.048783994698425]
Ethiopic/Amharic script is one of the oldest African writing systems, which serves at least 23 languages in East Africa.
The Amharic writing system, Abugida, has 282 syllables, 15 punctuation marks, and 20 numerals.
We presented the first comprehensive public datasets named HUST-ART, HUST-AST, ABE, and Tana for Amharic script detection and recognition in the natural scene.
arXiv Detail & Related papers (2022-03-23T03:19:35Z) - Writer Recognition Using Off-line Handwritten Single Block Characters [59.17685450892182]
We use personal identity numbers consisting of the six digits of the date of birth, DoB.
We evaluate two recognition approaches, one based on handcrafted features that compute directional measurements, and another based on deep features from a ResNet50 model.
Results show the presence of identity-related information in a piece of handwritten information as small as six digits with the DoB.
arXiv Detail & Related papers (2022-01-25T23:04:10Z) - Bangla Handwritten Digit Recognition and Generation [0.0]
A Semi-Supervised Generative Adversarial Network or SGAN has been applied to generate Bangla handwritten numerals.
In this paper, an architecture has been implemented which achieved the validation accuracy of 99.44% on BHAND dataset.
arXiv Detail & Related papers (2021-03-14T12:11:21Z) - BanglaWriting: A multi-purpose offline Bangla handwriting dataset [0.0]
This article presents a Bangla handwriting dataset that contains single-page handwritings of 260 individuals of different personalities.
This dataset contains 21,234 and 450 characters in total, along with this page representation of 32,470 unique words.
The dataset can be used for complex optical character recognition, handwritten word identification, handwriting variation and writer word generation.
arXiv Detail & Related papers (2020-11-15T11:08:53Z) - Deep Learning for Text Style Transfer: A Survey [71.8870854396927]
Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text.
We present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017.
We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data.
arXiv Detail & Related papers (2020-11-01T04:04:43Z) - HKR For Handwritten Kazakh & Russian Database [1.7499351967216341]
We present a new Russian and Kazakh database (with about 95% of Russian and 5% of Kazakh words/sentences respectively) for offline handwriting recognition.
The database is written in Cyrillic and shares the same 33 characters.
It can serve researchers in the field of handwriting recognition tasks by using deep and machine learning.
arXiv Detail & Related papers (2020-07-07T15:57:41Z) - Spectral Graph-based Features for Recognition of Handwritten Characters:
A Case Study on Handwritten Devanagari Numerals [0.0]
We propose an approach that exploits the robust graph representation and spectral graph embedding concept to represent handwritten characters.
For corroboration of the efficacy of the proposed method, extensive experiments were carried out on the standard handwritten numeral Computer Vision Pattern Recognition, Unit of Indian Statistical Institute Kolkata dataset.
arXiv Detail & Related papers (2020-07-07T08:40:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.