End-to-End Optical Character Recognition for Bengali Handwritten Words
- URL: http://arxiv.org/abs/2105.04020v1
- Date: Sun, 9 May 2021 20:48:56 GMT
- Title: End-to-End Optical Character Recognition for Bengali Handwritten Words
- Authors: Farisa Benta Safir, Abu Quwsar Ohi, M.F. Mridha, Muhammad Mostafa
Monowar, Md. Abdul Hamid
- Abstract summary: This paper introduces an end-to-end OCR system for Bengali language.
The proposed architecture implements an end to end strategy that recognises handwritten Bengali words from handwritten word images.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Optical character recognition (OCR) is a process of converting analogue
documents into digital using document images. Currently, many commercial and
non-commercial OCR systems exist for both handwritten and printed copies for
different languages. Despite this, very few works are available in case of
recognising Bengali words. Among them, most of the works focused on OCR of
printed Bengali characters. This paper introduces an end-to-end OCR system for
Bengali language. The proposed architecture implements an end to end strategy
that recognises handwritten Bengali words from handwritten word images. We
experiment with popular convolutional neural network (CNN) architectures,
including DenseNet, Xception, NASNet, and MobileNet to build the OCR
architecture. Further, we experiment with two different recurrent neural
networks (RNN) methods, LSTM and GRU. We evaluate the proposed architecture
using BanglaWritting dataset, which is a peer-reviewed Bengali handwritten
image dataset. The proposed method achieves 0.091 character error rate and
0.273 word error rate performed using DenseNet121 model with GRU recurrent
layer.
Related papers
- Multichannel Attention Networks with Ensembled Transfer Learning to Recognize Bangla Handwritten Charecter [1.5236380958983642]
The study employed a convolutional neural network (CNN) with ensemble transfer learning and a multichannel attention network.
We evaluated the proposed model using the CAMTERdb 3.1.2 data set and achieved 92% accuracy for the raw dataset and 98.00% for the preprocessed dataset.
arXiv Detail & Related papers (2024-08-20T15:51:01Z) - LOCR: Location-Guided Transformer for Optical Character Recognition [55.195165959662795]
We propose LOCR, a model that integrates location guiding into the transformer architecture during autoregression.
We train the model on a dataset comprising over 77M text-location pairs from 125K academic document pages, including bounding boxes for words, tables and mathematical symbols.
It outperforms all existing methods in our test set constructed from arXiv, as measured by edit distance, BLEU, METEOR and F-measure.
arXiv Detail & Related papers (2024-03-04T15:34:12Z) - Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through
Image-IDS Aligning [61.34060587461462]
We propose a two-stage framework for Chinese Text Recognition (CTR)
We pre-train a CLIP-like model through aligning printed character images and Ideographic Description Sequences (IDS)
This pre-training stage simulates humans recognizing Chinese characters and obtains the canonical representation of each character.
The learned representations are employed to supervise the CTR model, such that traditional single-character recognition can be improved to text-line recognition.
arXiv Detail & Related papers (2023-09-03T05:33:16Z) - bbOCR: An Open-source Multi-domain OCR Pipeline for Bengali Documents [0.23639235997306196]
We introduce Bengali$.$AI-BRACU-OCR (bbOCR), an open-source scalable document OCR system that can reconstruct Bengali documents into a structured searchable digitized format.
Our proposed solution is preferable over the current state-of-the-art Bengali OCR systems.
arXiv Detail & Related papers (2023-08-21T11:35:28Z) - Bengali Handwritten Digit Recognition using CNN with Explainable AI [0.5156484100374058]
We have used various machine learning algorithms and CNN to recognize handwritten Bengali digits.
Grad-CAM was used as an XAI method on our CNN model, which gave us insights into the model.
arXiv Detail & Related papers (2022-12-23T04:40:20Z) - Zero-Shot Video Captioning with Evolving Pseudo-Tokens [79.16706829968673]
We introduce a zero-shot video captioning method that employs two frozen networks: the GPT-2 language model and the CLIP image-text matching model.
The matching score is used to steer the language model toward generating a sentence that has a high average matching score to a subset of the video frames.
Our experiments show that the generated captions are coherent and display a broad range of real-world knowledge.
arXiv Detail & Related papers (2022-07-22T14:19:31Z) - GIT: A Generative Image-to-text Transformer for Vision and Language [138.91581326369837]
We train a Generative Image-to-text Transformer, GIT, to unify vision-language tasks such as image/video captioning and question answering.
Our model surpasses the human performance for the first time on TextCaps (138.2 vs. 125.5 in CIDEr)
arXiv Detail & Related papers (2022-05-27T17:03:38Z) - An empirical study of CTC based models for OCR of Indian languages [31.5002680968116]
Modelling unsegmented sequences using Connectionist Temporal Classification (CTC) is the most commonly used approach for segmentation-free OCR.
We present a study of various neural network models that uses CTC for transcribing step-wise predictions in the neural network output to a Unicode sequence.
We also introduce a new public dataset called Mozhi for word and line recognition in Indian language.
arXiv Detail & Related papers (2022-05-13T16:19:21Z) - Lexically Aware Semi-Supervised Learning for OCR Post-Correction [90.54336622024299]
Much of the existing linguistic data in many languages of the world is locked away in non-digitized books and documents.
Previous work has demonstrated the utility of neural post-correction methods on recognition of less-well-resourced languages.
We present a semi-supervised learning method that makes it possible to utilize raw images to improve performance.
arXiv Detail & Related papers (2021-11-04T04:39:02Z) - TrOCR: Transformer-based Optical Character Recognition with Pre-trained
Models [47.48019831416665]
We propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR.
TrOCR is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets.
Experiments show that the TrOCR model outperforms the current state-of-the-art models on both printed and handwritten text recognition tasks.
arXiv Detail & Related papers (2021-09-21T16:01:56Z) - An Efficient Language-Independent Multi-Font OCR for Arabic Script [0.0]
This paper proposes a complete Arabic OCR system that takes a scanned image of Arabic Naskh script as an input and generates a corresponding digital document.
This paper also proposes an improved font-independent character algorithm that outperforms the state-of-the-art segmentation algorithms.
arXiv Detail & Related papers (2020-09-18T22:57:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.