Bengali Handwritten Digit Recognition using CNN with Explainable AI
- URL: http://arxiv.org/abs/2212.12146v1
- Date: Fri, 23 Dec 2022 04:40:20 GMT
- Title: Bengali Handwritten Digit Recognition using CNN with Explainable AI
- Authors: Md Tanvir Rouf Shawon, Raihan Tanvir, Md. Golam Rabiul Alam
- Abstract summary: We have used various machine learning algorithms and CNN to recognize handwritten Bengali digits.
Grad-CAM was used as an XAI method on our CNN model, which gave us insights into the model.
- Score: 0.5156484100374058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Handwritten character recognition is a hot topic for research nowadays. If we
can convert a handwritten piece of paper into a text-searchable document using
the Optical Character Recognition (OCR) technique, we can easily understand the
content and do not need to read the handwritten document. OCR in the English
language is very common, but in the Bengali language, it is very hard to find a
good quality OCR application. If we can merge machine learning and deep
learning with OCR, it could be a huge contribution to this field. Various
researchers have proposed a number of strategies for recognizing Bengali
handwritten characters. A lot of ML algorithms and deep neural networks were
used in their work, but the explanations of their models are not available. In
our work, we have used various machine learning algorithms and CNN to recognize
handwritten Bengali digits. We have got acceptable accuracy from some ML
models, and CNN has given us great testing accuracy. Grad-CAM was used as an
XAI method on our CNN model, which gave us insights into the model and helped
us detect the origin of interest for recognizing a digit from an image.
Related papers
- Multichannel Attention Networks with Ensembled Transfer Learning to Recognize Bangla Handwritten Charecter [1.5236380958983642]
The study employed a convolutional neural network (CNN) with ensemble transfer learning and a multichannel attention network.
We evaluated the proposed model using the CAMTERdb 3.1.2 data set and achieved 92% accuracy for the raw dataset and 98.00% for the preprocessed dataset.
arXiv Detail & Related papers (2024-08-20T15:51:01Z) - Optical Text Recognition in Nepali and Bengali: A Transformer-based Approach [0.0]
This paper discusses text recognition for two scripts: Bengali and Nepali.
There are about 300 and 40 million Bengali and Nepali speakers respectively.
The results signify that the suggested technique corresponds with current approaches.
arXiv Detail & Related papers (2024-04-03T00:21:14Z) - Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through
Image-IDS Aligning [61.34060587461462]
We propose a two-stage framework for Chinese Text Recognition (CTR)
We pre-train a CLIP-like model through aligning printed character images and Ideographic Description Sequences (IDS)
This pre-training stage simulates humans recognizing Chinese characters and obtains the canonical representation of each character.
The learned representations are employed to supervise the CTR model, such that traditional single-character recognition can be improved to text-line recognition.
arXiv Detail & Related papers (2023-09-03T05:33:16Z) - OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models [122.27878464009181]
We conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks.
OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available.
arXiv Detail & Related papers (2023-05-13T11:28:37Z) - Efficient approach of using CNN based pretrained model in Bangla
handwritten digit recognition [0.0]
Handwritten digit recognition is essential for numerous applications in various industries.
Due to the complexity of Bengali writing in terms of variety in shape, size, and writing style, researchers did not get better accuracy usingSupervised machine learning algorithms to date.
We propose a novel CNN-based pre-trained handwritten digit recognition model which includes Resnet-50, Inception-v3, and EfficientNetB0 on NumtaDB dataset of 17 thousand instances with 10 classes.
arXiv Detail & Related papers (2022-09-19T15:58:53Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - Handwritten Digit Recognition using Machine and Deep Learning Algorithms [0.0]
We have performed handwritten digit recognition with the help of MNIST datasets using Support Vector Machines (SVM), Multi-Layer Perceptron (MLP) and Convolution Neural Network (CNN) models.
Our main objective is to compare the accuracy of the models stated above along with their execution time to get the best possible model for digit recognition.
arXiv Detail & Related papers (2021-06-23T18:23:01Z) - End-to-End Optical Character Recognition for Bengali Handwritten Words [0.0]
This paper introduces an end-to-end OCR system for Bengali language.
The proposed architecture implements an end to end strategy that recognises handwritten Bengali words from handwritten word images.
arXiv Detail & Related papers (2021-05-09T20:48:56Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.