Related papers: Attention based End to end network for Offline Writer Identification on Word level data

Attention based End to end network for Offline Writer Identification on Word level data

URL: http://arxiv.org/abs/2404.07602v1
Date: Thu, 11 Apr 2024 09:41:14 GMT
Title: Attention based End to end network for Offline Writer Identification on Word level data
Authors: Vineet Kumar, Suresh Sundaram,
Abstract summary: We propose a writer identification system based on an attention-driven Convolutional Neural Network (CNN) The system is trained utilizing image segments, known as fragments, extracted from word images, employing a pyramid-based strategy. The efficacy of the proposed algorithm is evaluated on three benchmark databases.
Score: 3.5829161769306244
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Writer identification due to its widespread application in various fields has gained popularity over the years. In scenarios where optimum handwriting samples are available, whether they be in the form of a single line, a sentence, or an entire page, writer identification algorithms have demonstrated noteworthy levels of accuracy. However, in scenarios where only a limited number of handwritten samples are available, particularly in the form of word images, there is a significant scope for improvement. In this paper, we propose a writer identification system based on an attention-driven Convolutional Neural Network (CNN). The system is trained utilizing image segments, known as fragments, extracted from word images, employing a pyramid-based strategy. This methodology enables the system to capture a comprehensive representation of the data, encompassing both fine-grained details and coarse features across various levels of abstraction. These extracted fragments serve as the training data for the convolutional network, enabling it to learn a more robust representation compared to traditional convolution-based networks trained on word images. Additionally, the paper explores the integration of an attention mechanism to enhance the representational power of the learned features. The efficacy of the proposed algorithm is evaluated on three benchmark databases, demonstrating its proficiency in writer identification tasks, particularly in scenarios with limited access to handwriting data.

Related papers

Language Guided Domain Generalized Medical Image Segmentation [68.93124785575739]
Single source domain generalization holds promise for more reliable and consistent image segmentation across real-world clinical settings. We propose an approach that explicitly leverages textual information by incorporating a contrastive learning mechanism guided by the text encoder features. Our approach achieves favorable performance against existing methods in literature.
arXiv Detail & Related papers (2024-04-01T17:48:15Z)
Leveraging Open-Vocabulary Diffusion to Camouflaged Instance Segmentation [59.78520153338878]
Text-to-image diffusion techniques have shown exceptional capability of producing high-quality images from text descriptions. We propose a method built upon a state-of-the-art diffusion model, empowered by open-vocabulary to learn multi-scale textual-visual features for camouflaged object representations.
arXiv Detail & Related papers (2023-12-29T07:59:07Z)
WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models [8.334487584550185]
We present a latent diffusion-based method for styled text-to-text-content-image generation on word-level. Our proposed method is able to generate realistic word image samples from different writer styles. We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and get similar writer retrieval score as real data.
arXiv Detail & Related papers (2023-03-29T10:19:26Z)
SpaText: Spatio-Textual Representation for Controllable Image Generation [61.89548017729586]
SpaText is a new method for text-to-image generation using open-vocabulary scene control. In addition to a global text prompt that describes the entire scene, the user provides a segmentation map. We show its effectiveness on two state-of-the-art diffusion models: pixel-based and latent-conditional-based.
arXiv Detail & Related papers (2022-11-25T18:59:10Z)
Siamese based Neural Network for Offline Writer Identification on word level data [7.747239584541488]
We propose a novel scheme to identify the author of a document based on the input word image. Our method is text independent and does not impose any constraint on the size of the input image under examination.
arXiv Detail & Related papers (2022-11-17T10:01:46Z)
Using virtual edges to extract keywords from texts modeled as complex networks [0.1611401281366893]
We modeled texts co-occurrence networks, where nodes are words and edges are established by contextual or semantical similarity. We found that, in fact, the use of virtual edges can improve the discriminability of co-occurrence networks.
arXiv Detail & Related papers (2022-05-04T16:43:03Z)
CRIS: CLIP-Driven Referring Image Segmentation [71.56466057776086]
We propose an end-to-end CLIP-Driven Referring Image framework (CRIS) CRIS resorts to vision-language decoding and contrastive learning for achieving the text-to-pixel alignment. Our proposed framework significantly outperforms the state-of-the-art performance without any post-processing.
arXiv Detail & Related papers (2021-11-30T07:29:08Z)
Object Detection Based Handwriting Localization [2.6641834518599308]
We present an object detection based approach to localize handwritten regions from documents. The proposed approach is also expected to facilitate other tasks such as handwriting recognition and signature verification.
arXiv Detail & Related papers (2021-06-28T21:25:20Z)
Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval [8.317191999275536]
This paper focuses on leveraging multi-modal content in the form of visual and textual cues to tackle the task of fine-grained image classification and retrieval. We employ a Graph Convolutional Network to perform multi-modal reasoning and obtain relationship-enhanced features by learning a common semantic space between salient objects and text found in an image.
arXiv Detail & Related papers (2020-09-21T12:31:42Z)
Improving Image Captioning with Better Use of Captions [65.39641077768488]
We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation. Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning. During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences.
arXiv Detail & Related papers (2020-06-21T14:10:47Z)
TextScanner: Reading Characters in Order for Robust Scene Text Recognition [60.04267660533966]
TextScanner is an alternative approach for scene text recognition. It generates pixel-wise, multi-channel segmentation maps for character class, position and order. It also adopts RNN for context modeling and performs paralleled prediction for character position and class.
arXiv Detail & Related papers (2019-12-28T07:52:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.