Writer Retrieval and Writer Identification in Greek Papyri
- URL: http://arxiv.org/abs/2212.07664v1
- Date: Thu, 15 Dec 2022 08:42:25 GMT
- Title: Writer Retrieval and Writer Identification in Greek Papyri
- Authors: Vincent Christlein, Isabelle Marthot-Santaniello, Martin Mayr,
Anguelos Nicolaou, Mathias Seuret
- Abstract summary: Writer identification refers to the classification of known writers while writer retrieval seeks to find the writer by means of image similarity in a dataset of images.
While automatic writer identification/retrieval methods already provide promising results for many historical document types, papyri data is very challenging due to the fiber structures and severe artifacts.
We investigate several methods and show that a good binarization is key to an improved writer identification in papyri writings.
- Score: 4.44566870214758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The analysis of digitized historical manuscripts is typically addressed by
paleographic experts. Writer identification refers to the classification of
known writers while writer retrieval seeks to find the writer by means of image
similarity in a dataset of images. While automatic writer
identification/retrieval methods already provide promising results for many
historical document types, papyri data is very challenging due to the fiber
structures and severe artifacts. Thus, an important step for an improved writer
identification is the preprocessing and feature sampling process. We
investigate several methods and show that a good binarization is key to an
improved writer identification in papyri writings. We focus mainly on writer
retrieval using unsupervised feature methods based on traditional or
self-supervised-based methods. It is, however, also comparable to the state of
the art supervised deep learning-based method in the case of writer
classification/re-identification.
Related papers
- Efficiently Leveraging Linguistic Priors for Scene Text Spotting [63.22351047545888]
This paper proposes a method that leverages linguistic knowledge from a large text corpus to replace the traditional one-hot encoding used in auto-regressive scene text spotting and recognition models.
We generate text distributions that align well with scene text datasets, removing the need for in-domain fine-tuning.
Experimental results show that our method not only improves recognition accuracy but also enables more accurate localization of words.
arXiv Detail & Related papers (2024-02-27T01:57:09Z) - Enhancing Representation Generalization in Authorship Identification [9.148691357200216]
Authorship identification ascertains the authorship of texts whose origins remain undisclosed.
Modern authorship identification methods have proven effective in distinguishing authorial styles.
The presented work addresses the challenge of enhancing the generalization of stylistic representations in authorship identification.
arXiv Detail & Related papers (2023-09-30T17:11:00Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Open Set Classification of Untranscribed Handwritten Documents [56.0167902098419]
Huge amounts of digital page images of important manuscripts are preserved in archives worldwide.
The class or typology'' of a document is perhaps the most important tag to be included in the metadata.
The technical problem is one of automatic classification of documents, each consisting of a set of untranscribed handwritten text images.
arXiv Detail & Related papers (2022-06-20T20:43:50Z) - Digital Editions as Distant Supervision for Layout Analysis of Printed
Books [76.29918490722902]
We describe methods for exploiting this semantic markup as distant supervision for training and evaluating layout analysis models.
In experiments with several model architectures on the half-million pages of the Deutsches Textarchiv (DTA), we find a high correlation of these region-level evaluation methods with pixel-level and word-level metrics.
We discuss the possibilities for improving accuracy with self-training and the ability of models trained on the DTA to generalize to other historical printed books.
arXiv Detail & Related papers (2021-12-23T16:51:53Z) - Letter-level Online Writer Identification [86.13203975836556]
We focus on a novel problem, letter-level online writer-id, which requires only a few trajectories of written letters as identification cues.
A main challenge is that a person often writes a letter in different styles from time to time.
We refer to this problem as the variance of online writing styles (Var-O-Styles)
arXiv Detail & Related papers (2021-12-06T07:21:53Z) - Exploiting Multi-Scale Fusion, Spatial Attention and Patch Interaction
Techniques for Text-Independent Writer Identification [15.010153819096056]
In this paper, three different deep learning techniques - spatial attention mechanism, multi-scale feature fusion and patch-based CNN were proposed to capture the difference between each writer's handwriting.
The proposed methods outperforms various state-of-the-art methodologies on word-level and page-level writer identification methods on three publicly available datasets.
arXiv Detail & Related papers (2021-11-20T14:41:36Z) - Handwriting Classification for the Analysis of Art-Historical Documents [6.918282834668529]
We focus on the analysis of handwriting in scanned documents from the art-historic archive of the WPI.
We propose a handwriting classification model that labels extracted text fragments based on their visual structure.
arXiv Detail & Related papers (2020-11-04T13:06:46Z) - Neural Deepfake Detection with Factual Structure of Text [78.30080218908849]
We propose a graph-based model for deepfake detection of text.
Our approach represents the factual structure of a given document as an entity graph.
Our model can distinguish the difference in the factual structure between machine-generated text and human-written text.
arXiv Detail & Related papers (2020-10-15T02:35:31Z) - Single-sample writers -- "Document Filter" and their impacts on writer
identification [7.459089186033613]
"document filter" protocol is supposed to be used as a preprocessing technique.
"document filter" protocol is supposed to capture the features from the writer itself.
The recognition rate obtained using the "document filter" protocol drops from 81.80% to 50.37%.
arXiv Detail & Related papers (2020-05-18T02:02:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.