Towards the Influence of Text Quantity on Writer Retrieval
- URL: http://arxiv.org/abs/2506.07566v1
- Date: Mon, 09 Jun 2025 09:05:15 GMT
- Title: Towards the Influence of Text Quantity on Writer Retrieval
- Authors: Marco Peer, Robert Sablatnig, Florian Kleber,
- Abstract summary: Writer retrieval identifies documents authored by the same individual within a dataset based on handwriting similarities.<n>We examine three state-of-the-art writer retrieval systems, including both handcrafted and deep learning-based approaches.
- Score: 1.024113475677323
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the task of writer retrieval, which identifies documents authored by the same individual within a dataset based on handwriting similarities. While existing datasets and methodologies primarily focus on page level retrieval, we explore the impact of text quantity on writer retrieval performance by evaluating line- and word level retrieval. We examine three state-of-the-art writer retrieval systems, including both handcrafted and deep learning-based approaches, and analyze their performance using varying amounts of text. Our experiments on the CVL and IAM dataset demonstrate that while performance decreases by 20-30% when only one line of text is used as query and gallery, retrieval accuracy remains above 90% of full-page performance when at least four lines are included. We further show that text-dependent retrieval can maintain strong performance in low-text scenarios. Our findings also highlight the limitations of handcrafted features in low-text scenarios, with deep learning-based methods like NetVLAD outperforming traditional VLAD encoding.
Related papers
- UniRec-0.1B: Unified Text and Formula Recognition with 0.1B Parameters [55.34921520578968]
Vision-language models (VLMs) have achieved impressive unified recognition of text and formulas.<n>We propose UniRec-0.1B, a unified recognition model with only 0.1B parameters.<n>It is capable of performing text and formula recognition at multiple levels, including characters, words, lines, paragraphs, and documents.
arXiv Detail & Related papers (2025-12-24T10:35:21Z) - Beyond Cropped Regions: New Benchmark and Corresponding Baseline for Chinese Scene Text Retrieval in Diverse Layouts [2.9100667158464035]
Chinese scene text retrieval is a practical task that aims to search for images containing visual instances of a Chinese query text.<n>Current efforts tend to inherit the solution for English scene text retrieval, failing to achieve satisfactory performance.<n>We propose Chinese Scene Text Retrieval CLIP (CSTR-CLIP), a novel model that integrates global visual information with multi-granularity alignment training.
arXiv Detail & Related papers (2025-06-05T13:10:17Z) - MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents [26.39534684408116]
This work introduces a new benchmark, named MMDocIR, that encompasses two distinct tasks: page-level and layout-level retrieval.<n>The MMDocIR benchmark comprises a rich dataset featuring 1,685 questions annotated by experts and 173,843 questions with bootstrapped labels.
arXiv Detail & Related papers (2025-01-15T14:30:13Z) - Improving embedding with contrastive fine-tuning on small datasets with expert-augmented scores [12.86467344792873]
The proposed method uses soft labels derived from expert-augmented scores to fine-tune embedding models.
The paper evaluates the method using a Q&A dataset from an online shopping website and eight expert models.
arXiv Detail & Related papers (2024-08-19T01:59:25Z) - Towards Improving Document Understanding: An Exploration on
Text-Grounding via MLLMs [96.54224331778195]
We present a text-grounding document understanding model, termed TGDoc, which enhances MLLMs with the ability to discern the spatial positioning of text within images.
We formulate instruction tuning tasks including text detection, recognition, and spotting to facilitate the cohesive alignment between the visual encoder and large language model.
Our method achieves state-of-the-art performance across multiple text-rich benchmarks, validating the effectiveness of our method.
arXiv Detail & Related papers (2023-11-22T06:46:37Z) - TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision [61.186488081379]
We propose TextFormer, a query-based end-to-end text spotter with Transformer architecture.
TextFormer builds upon an image encoder and a text decoder to learn a joint semantic understanding for multi-task modeling.
It allows for mutual training and optimization of classification, segmentation, and recognition branches, resulting in deeper feature sharing.
arXiv Detail & Related papers (2023-06-06T03:37:41Z) - The Learnable Typewriter: A Generative Approach to Text Analysis [17.355857281085164]
We present a generative document-specific approach to character analysis and recognition in text lines.
Taking as input a set of text lines with similar font or handwriting, our approach can learn a large number of different characters.
arXiv Detail & Related papers (2023-02-03T11:17:59Z) - TRIE++: Towards End-to-End Information Extraction from Visually Rich
Documents [51.744527199305445]
This paper proposes a unified end-to-end information extraction framework from visually rich documents.
Text reading and information extraction can reinforce each other via a well-designed multi-modal context block.
The framework can be trained in an end-to-end trainable manner, achieving global optimization.
arXiv Detail & Related papers (2022-07-14T08:52:07Z) - Minimally-Supervised Structure-Rich Text Categorization via Learning on
Text-Rich Networks [61.23408995934415]
We propose a novel framework for minimally supervised categorization by learning from the text-rich network.
Specifically, we jointly train two modules with different inductive biases -- a text analysis module for text understanding and a network learning module for class-discriminative, scalable network learning.
Our experiments show that given only three seed documents per category, our framework can achieve an accuracy of about 92%.
arXiv Detail & Related papers (2021-02-23T04:14:34Z) - TAP: Text-Aware Pre-training for Text-VQA and Text-Caption [75.44716665758415]
We propose Text-Aware Pre-training (TAP) for Text-VQA and Text-Caption tasks.
TAP explicitly incorporates scene text (generated from OCR engines) in pre-training.
Our approach outperforms the state of the art by large margins on multiple tasks.
arXiv Detail & Related papers (2020-12-08T18:55:21Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.