Related papers: Word Segmentation from Unconstrained Handwritten Bangla Document Images using Distance Transform

Word Segmentation from Unconstrained Handwritten Bangla Document Images using Distance Transform

URL: http://arxiv.org/abs/2009.08037v1
Date: Thu, 17 Sep 2020 03:14:27 GMT
Title: Word Segmentation from Unconstrained Handwritten Bangla Document Images using Distance Transform
Authors: Pawan Kumar Singh, Shubham Sinha, Sagnik Pal Chowdhury, Ram Sarkar, Mita Nasipuri
Abstract summary: This paper addresses the automatic segmentation of text words directly from unconstrained Bangla handwritten document images. The popular Distance algorithm is applied for locating the outer boundary of the word images. The proposed technique is tested on 50 random images taken from CMATERdb database.
Score: 34.89370782262938
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Segmentation of handwritten document images into text lines and words is one of the most significant and challenging tasks in the development of a complete Optical Character Recognition (OCR) system. This paper addresses the automatic segmentation of text words directly from unconstrained Bangla handwritten document images. The popular Distance transform (DT) algorithm is applied for locating the outer boundary of the word images. This technique is free from generating the over-segmented words. A simple post-processing procedure is applied to isolate the under-segmented word images, if any. The proposed technique is tested on 50 random images taken from CMATERdb1.1.1 database. Satisfactory result is achieved with a segmentation accuracy of 91.88% which confirms the robustness of the proposed methodology.

Related papers

Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation [27.13782704236074]
We propose a new referring remote sensing image segmentation method to fully exploit the visual and linguistic representations. The proposed fine-grained image-text alignment module (FIAM) would simultaneously leverage the features of the input image and the corresponding texts. We evaluate the effectiveness of the proposed method on two public referring remote sensing datasets including RefSegRS and RRSIS-D.
arXiv Detail & Related papers (2024-09-20T16:45:32Z)
Decoder Pre-Training with only Text for Scene Text Recognition [54.93037783663204]
Scene text recognition (STR) pre-training methods have achieved remarkable progress, primarily relying on synthetic datasets. We introduce a novel method named Decoder Pre-training with only text for STR (DPTR) DPTR treats text embeddings produced by the CLIP text encoder as pseudo visual embeddings and uses them to pre-train the decoder.
arXiv Detail & Related papers (2024-08-11T06:36:42Z)
Optimization of Image Processing Algorithms for Character Recognition in Cultural Typewritten Documents [0.8158530638728501]
This paper evaluates the impact of image processing methods and parameter tuning in Optical Character Recognition (OCR) The approach uses a multi-objective problem formulation to minimize Levenshtein edit distance and maximize the number of words correctly identified with a non-dominated sorting genetic algorithm (NSGA-II) Our findings suggest that employing image pre-processing algorithms in OCR might be more suitable for typologies where the text recognition task without pre-processing does not produce good results.
arXiv Detail & Related papers (2023-11-27T11:44:46Z)
Text Augmented Spatial-aware Zero-shot Referring Image Segmentation [60.84423786769453]
We introduce a Text Augmented Spatial-aware (TAS) zero-shot referring image segmentation framework. TAS incorporates a mask proposal network for instance-level mask extraction, a text-augmented visual-text matching score for mining the image-text correlation, and a spatial for mask post-processing. The proposed method clearly outperforms state-of-the-art zero-shot referring image segmentation methods.
arXiv Detail & Related papers (2023-10-27T10:52:50Z)
Zero-shot spatial layout conditioning for text-to-image diffusion models [52.24744018240424]
Large-scale text-to-image diffusion models have significantly improved the state of the art in generative image modelling. We consider image generation from text associated with segments on the image canvas, which combines an intuitive natural language interface with precise spatial control over the generated content. We propose ZestGuide, a zero-shot segmentation guidance approach that can be plugged into pre-trained text-to-image diffusion models.
arXiv Detail & Related papers (2023-06-23T19:24:48Z)
WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models [8.334487584550185]
We present a latent diffusion-based method for styled text-to-text-content-image generation on word-level. Our proposed method is able to generate realistic word image samples from different writer styles. We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and get similar writer retrieval score as real data.
arXiv Detail & Related papers (2023-03-29T10:19:26Z)
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs [10.484851004093919]
We tackle open-world semantic segmentation, which aims at learning to segment arbitrary visual concepts in images. Existing open-world segmentation methods have shown impressive advances by employing contrastive learning (CL) to learn diverse visual concepts. We propose a novel Text-grounded Contrastive Learning framework that enables a model to directly learn region-text alignment.
arXiv Detail & Related papers (2022-12-01T18:59:03Z)
Self-supervised Character-to-Character Distillation for Text Recognition [54.12490492265583]
We propose a novel self-supervised Character-to-Character Distillation method, CCD, which enables versatile augmentations to facilitate text representation learning. CCD achieves state-of-the-art results, with average performance gains of 1.38% in text recognition, 1.7% in text segmentation, 0.24 dB (PSNR) and 0.0321 (SSIM) in text super-resolution.
arXiv Detail & Related papers (2022-11-01T05:48:18Z)
BN-HTRd: A Benchmark Dataset for Document Level Offline Bangla Handwritten Text Recognition (HTR) and Line Segmentation [0.0]
We introduce a new dataset for offline Handwritten Text Recognition (HTR) from images of Bangla scripts comprising words, lines, and document-level annotations. The BN-HTRd dataset is based on the BBC Bangla News corpus, meant to act as ground truth texts. Our dataset includes 788 images of handwritten pages produced by approximately 150 different writers.
arXiv Detail & Related papers (2022-05-29T22:56:26Z)
FlexIT: Towards Flexible Semantic Image Translation [59.09398209706869]
We propose FlexIT, a novel method which can take any input image and a user-defined text instruction for editing. First, FlexIT combines the input image and text into a single target point in the CLIP multimodal embedding space. We iteratively transform the input image toward the target point, ensuring coherence and quality with a variety of novel regularization terms.
arXiv Detail & Related papers (2022-03-09T13:34:38Z)
Confronting the Constraints for Optical Character Segmentation from Printed Bangla Text Image [0.0]
Optical character recognition system basically converts printed images into editable texts for better storage and usability. To be completely functional, the system needs to go through some crucial methods such as pre-processing and segmentation. Our proposed algorithm is able to segment characters both from ideal and non-ideal cases of scanned or captured images.
arXiv Detail & Related papers (2020-03-18T17:58:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.