Related papers: Feature Mixing for Writer Retrieval and Identification on Papyri Fragments

Feature Mixing for Writer Retrieval and Identification on Papyri Fragments

URL: http://arxiv.org/abs/2306.12939v1
Date: Thu, 22 Jun 2023 14:55:01 GMT
Title: Feature Mixing for Writer Retrieval and Identification on Papyri Fragments
Authors: Marco Peer and Robert Sablatnig
Abstract summary: This paper proposes a deep-learning-based approach to writer retrieval and identification for papyri. We present a novel neural network architecture that combines a residual backbone with a feature mixing stage to improve retrieval performance.
Score: 0.7614628596146599
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper proposes a deep-learning-based approach to writer retrieval and identification for papyri, with a focus on identifying fragments associated with a specific writer and those corresponding to the same image. We present a novel neural network architecture that combines a residual backbone with a feature mixing stage to improve retrieval performance, and the final descriptor is derived from a projection layer. The methodology is evaluated on two benchmarks: PapyRow, where we achieve a mAP of 26.6 % and 24.9 % on writer and page retrieval, and HisFragIR20, showing state-of-the-art performance (44.0 % and 29.3 % mAP). Furthermore, our network has an accuracy of 28.7 % for writer identification. Additionally, we conduct experiments on the influence of two binarization techniques on fragments and show that binarizing does not enhance performance. Our code and models are available to the community.

Related papers

OFFSET: Segmentation-based Focus Shift Revision for Composed Image Retrieval [59.377821673653436]
Composed Image Retrieval (CIR) is capable of expressing users' intricate retrieval requirements flexibly.<n>CIR remains in its nascent stages due to two limitations: 1) inhomogeneity between dominant and noisy portions in visual data is ignored, leading to query feature degradation.<n>This work presents a focus mapping-based feature extractor, which consists of two modules: dominant portion segmentation and dual focus mapping.
arXiv Detail & Related papers (2025-07-08T03:27:46Z)
Assessing the impact of Binarization for Writer Identification in Greek Papyrus [0.0]
A common preprocessing step in writer identification pipelines is image binarization, which prevents the model from learning background features.<n>This is challenging in historical documents, in our case Greek papyri, as background is often non-uniform, fragmented, and discolored with visible fiber structures.<n>We compare traditional binarization methods to state-of-the-art Deep Learning (DL) models, evaluating the impact of binarization quality on subsequent writer identification performance.
arXiv Detail & Related papers (2025-06-18T20:00:57Z)
Towards Storage-Efficient Visual Document Retrieval: An Empirical Study on Reducing Patch-Level Embeddings [70.26204343623215]
ColPali/ColQwen2 encodes each page into multiple patch-level embeddings and leads to excessive memory usage.<n>This empirical study investigates methods to reduce patch embeddings per page at minimum performance degradation.
arXiv Detail & Related papers (2025-06-05T13:06:01Z)
An Efficient MLP-based Point-guided Segmentation Network for Ore Images with Ambiguous Boundary [12.258442550351178]
This paper proposes a lightweight framework based on Multi-Layer Perceptron (MLP), which focuses on solving the problem of edge burring. Our approach achieves a remarkable processing speed of over 27 frames per second with a model size of only 73 MB. Our method delivers a consistently high level of accuracy, with impressive performance scores of 60.4 and 48.9 in$AP_50box$ and$AP_50mask$ respectively.
arXiv Detail & Related papers (2024-02-27T10:09:29Z)
Detecting and recognizing characters in Greek papyri with YOLOv8, DeiT and SimCLR [9.7902367664742]
This paper discusses our submission to the ICDAR 2023 Competition on Detection and Recognition of Greek Letters on Papyri' We used an ensemble of YOLOv8 models to detect and classify individual characters and employed two different approaches for refining the character predictions. Our submission won the recognition challenge with a mAP of 42.2%, and was runner-up in the detection challenge with a mean average precision (mAP) of 51.4%.
arXiv Detail & Related papers (2024-01-23T06:08:00Z)
Towards Writer Retrieval for Historical Datasets [0.6445605125467572]
unsupervised approach for writer retrieval based on clustering SIFT descriptors detected at keypoint locations. residual network followed by our proposed NetRVLAD, an encoding layer with reduced complexity. We show that our approach achieves comparable performance on a modern dataset as well.
arXiv Detail & Related papers (2023-05-09T11:44:44Z)
Pattern Spotting and Image Retrieval in Historical Documents using Deep Hashing [60.67014034968582]
This paper presents a deep learning approach for image retrieval and pattern spotting in digital collections of historical documents. Deep learning models are used for feature extraction, considering two distinct variants, which provide either real-valued or binary code representations. The proposed approach also reduces the search time by up to 200x and the storage cost up to 6,000x when compared to related works.
arXiv Detail & Related papers (2022-08-04T01:39:37Z)
Spatio-temporal Relation Modeling for Few-shot Action Recognition [100.3999454780478]
We propose a few-shot action recognition framework, STRM, which enhances class-specific featureriminability while simultaneously learning higher-order temporal representations. Our approach achieves an absolute gain of 3.5% in classification accuracy, as compared to the best existing method in the literature.
arXiv Detail & Related papers (2021-12-09T18:59:14Z)
Face Trees for Expression Recognition [13.099925083569333]
We propose an end-to-end architecture for facial expression recognition. The proposed architecture incorporates two main streams, one focusing on landmark positions to learn the structure of the face, the other focuses on patches around the landmarks to learn texture information. We conduct extensive experiments on two large-scale publicly available facial expression datasets, AffectNet and FER2013, to evaluate the efficacy of our approach.
arXiv Detail & Related papers (2021-12-05T06:35:12Z)
G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation [49.421099172544196]
We propose a novel semantic-guided feature imitation technique, which automatically performs soft matching between feature pairs across all pyramid levels. We also introduce contrastive distillation to effectively capture the information encoded in the relationship between different feature regions. Our method consistently outperforms the existing detection KD techniques, and works when (1) components in the framework are used separately and in conjunction.
arXiv Detail & Related papers (2021-08-17T07:44:27Z)
Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching [60.8427677151492]
We propose CMatch, a Character-level distribution matching method to perform fine-grained adaptation between each character in two domains. Experiments on the Libri-Adapt dataset show that our proposed approach achieves 14.39% and 16.50% relative Word Error Rate (WER) reduction on both cross-device and cross-environment ASR.
arXiv Detail & Related papers (2021-04-15T14:36:54Z)
A Replication Study of Dense Passage Retriever [32.192420072129636]
We study the dense passage retriever (DPR) technique proposed by Karpukhin et al. ( 2020) for end-to-end open-domain question answering. We present a replication study of this work, starting with model checkpoints provided by the authors. We are able to improve end-to-end question answering effectiveness using exactly the same models as in the original work.
arXiv Detail & Related papers (2021-04-12T18:10:39Z)
Corner Proposal Network for Anchor-free, Two-stage Object Detection [174.59360147041673]
The goal of object detection is to determine the class and location of objects in an image. This paper proposes a novel anchor-free, two-stage framework which first extracts a number of object proposals. We demonstrate that these two stages are effective solutions for improving recall and precision.
arXiv Detail & Related papers (2020-07-27T19:04:57Z)
Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning [86.45526827323954]
Weakly-supervised semantic segmentation is a challenging task as no pixel-wise label information is provided for training. We propose an iterative algorithm to learn such pairwise relations. We show that the proposed algorithm performs favorably against the state-of-the-art methods.
arXiv Detail & Related papers (2020-02-19T10:32:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.