Confronting the Constraints for Optical Character Segmentation from
Printed Bangla Text Image
- URL: http://arxiv.org/abs/2003.08384v5
- Date: Tue, 5 Jan 2021 18:11:50 GMT
- Title: Confronting the Constraints for Optical Character Segmentation from
Printed Bangla Text Image
- Authors: Abu Saleh Md. Abir, Sanjana Rahman, Samia Ellin, Maisha Farzana, Md
Hridoy Manik, Chowdhury Rafeed Rahman
- Abstract summary: Optical character recognition system basically converts printed images into editable texts for better storage and usability.
To be completely functional, the system needs to go through some crucial methods such as pre-processing and segmentation.
Our proposed algorithm is able to segment characters both from ideal and non-ideal cases of scanned or captured images.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a world of digitization, optical character recognition holds the
automation to written history. Optical character recognition system basically
converts printed images into editable texts for better storage and usability.
To be completely functional, the system needs to go through some crucial
methods such as pre-processing and segmentation. Pre-processing helps printed
data to be noise free and gets rid of skewness efficiently whereas segmentation
helps the image fragment into line, word and character precisely for better
conversion. These steps hold the door to better accuracy and consistent results
for a printed image to be ready for conversion. Our proposed algorithm is able
to segment characters both from ideal and non-ideal cases of scanned or
captured images giving a sustainable outcome. The implementation of our work is
provided here: https://cutt.ly/rgdfBIa
Related papers
- Decoder Pre-Training with only Text for Scene Text Recognition [54.93037783663204]
Scene text recognition (STR) pre-training methods have achieved remarkable progress, primarily relying on synthetic datasets.
We introduce a novel method named Decoder Pre-training with only text for STR (DPTR)
DPTR treats text embeddings produced by the CLIP text encoder as pseudo visual embeddings and uses them to pre-train the decoder.
arXiv Detail & Related papers (2024-08-11T06:36:42Z) - Optimization of Image Processing Algorithms for Character Recognition in
Cultural Typewritten Documents [0.8158530638728501]
This paper evaluates the impact of image processing methods and parameter tuning in Optical Character Recognition (OCR)
The approach uses a multi-objective problem formulation to minimize Levenshtein edit distance and maximize the number of words correctly identified with a non-dominated sorting genetic algorithm (NSGA-II)
Our findings suggest that employing image pre-processing algorithms in OCR might be more suitable for typologies where the text recognition task without pre-processing does not produce good results.
arXiv Detail & Related papers (2023-11-27T11:44:46Z) - Perceptual Image Compression with Cooperative Cross-Modal Side
Information [53.356714177243745]
We propose a novel deep image compression method with text-guided side information to achieve a better rate-perception-distortion tradeoff.
Specifically, we employ the CLIP text encoder and an effective Semantic-Spatial Aware block to fuse the text and image features.
arXiv Detail & Related papers (2023-11-23T08:31:11Z) - Self-supervised Character-to-Character Distillation for Text Recognition [54.12490492265583]
We propose a novel self-supervised Character-to-Character Distillation method, CCD, which enables versatile augmentations to facilitate text representation learning.
CCD achieves state-of-the-art results, with average performance gains of 1.38% in text recognition, 1.7% in text segmentation, 0.24 dB (PSNR) and 0.0321 (SSIM) in text super-resolution.
arXiv Detail & Related papers (2022-11-01T05:48:18Z) - Scaling Autoregressive Models for Content-Rich Text-to-Image Generation [95.02406834386814]
Parti treats text-to-image generation as a sequence-to-sequence modeling problem.
Parti uses a Transformer-based image tokenizer, ViT-VQGAN, to encode images as sequences of discrete tokens.
PartiPrompts (P2) is a new holistic benchmark of over 1600 English prompts.
arXiv Detail & Related papers (2022-06-22T01:11:29Z) - Image preprocessing and modified adaptive thresholding for improving OCR [0.0]
In this paper, I have proposed a method to find the major pixel intensity inside the text and thresholding an image accordingly.
Based on the results obtained, it can be observed that this algorithm can be efficiently applied in the field of image processing for OCR.
arXiv Detail & Related papers (2021-11-28T08:13:20Z) - An Efficient Language-Independent Multi-Font OCR for Arabic Script [0.0]
This paper proposes a complete Arabic OCR system that takes a scanned image of Arabic Naskh script as an input and generates a corresponding digital document.
This paper also proposes an improved font-independent character algorithm that outperforms the state-of-the-art segmentation algorithms.
arXiv Detail & Related papers (2020-09-18T22:57:03Z) - Word Segmentation from Unconstrained Handwritten Bangla Document Images
using Distance Transform [34.89370782262938]
This paper addresses the automatic segmentation of text words directly from unconstrained Bangla handwritten document images.
The popular Distance algorithm is applied for locating the outer boundary of the word images.
The proposed technique is tested on 50 random images taken from CMATERdb database.
arXiv Detail & Related papers (2020-09-17T03:14:27Z) - IMRAM: Iterative Matching with Recurrent Attention Memory for
Cross-Modal Image-Text Retrieval [105.77562776008459]
Existing methods leverage the attention mechanism to explore such correspondence in a fine-grained manner.
It may be difficult to optimally capture such sophisticated correspondences in existing methods.
We propose an Iterative Matching with Recurrent Attention Memory (IMRAM) method, in which correspondences are captured with multiple steps of alignments.
arXiv Detail & Related papers (2020-03-08T12:24:41Z) - TextScanner: Reading Characters in Order for Robust Scene Text
Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition.
It generates pixel-wise, multi-channel segmentation maps for character class, position and order.
It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.