Single-sample writers -- "Document Filter" and their impacts on writer
identification
- URL: http://arxiv.org/abs/2005.08424v1
- Date: Mon, 18 May 2020 02:02:31 GMT
- Title: Single-sample writers -- "Document Filter" and their impacts on writer
identification
- Authors: Fabio Pinhelli, Alceu S. Britto Jr, Luiz S. Oliveira, Yandre M. G.
Costa, Diego Bertolini
- Abstract summary: "document filter" protocol is supposed to be used as a preprocessing technique.
"document filter" protocol is supposed to capture the features from the writer itself.
The recognition rate obtained using the "document filter" protocol drops from 81.80% to 50.37%.
- Score: 7.459089186033613
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The writing can be used as an important biometric modality which allows to
unequivocally identify an individual. It happens because the writing of two
different persons present differences that can be explored both in terms of
graphometric properties or even by addressing the manuscript as a digital
image, taking into account the use of image processing techniques that can
properly capture different visual attributes of the image (e.g. texture). In
this work, perform a detailed study in which we dissect whether or not the use
of a database with only a single sample taken from some writers may skew the
results obtained in the experimental protocol. In this sense, we propose here
what we call "document filter". The "document filter" protocol is supposed to
be used as a preprocessing technique, such a way that all the data taken from
fragments of the same document must be placed either into the training or into
the test set. The rationale behind it, is that the classifier must capture the
features from the writer itself, and not features regarding other
particularities which could affect the writing in a specific document (i.e.
emotional state of the writer, pen used, paper type, and etc.). By analyzing
the literature, one can find several works dealing the writer identification
problem. However, the performance of the writer identification systems must be
evaluated also taking into account the occurrence of writer volunteers who
contributed with a single sample during the creation of the manuscript
databases. To address the open issue investigated here, a comprehensive set of
experiments was performed on the IAM, BFL and CVL databases. They have shown
that, in the most extreme case, the recognition rate obtained using the
"document filter" protocol drops from 81.80% to 50.37%.
Related papers
- Contextual Document Embeddings [77.22328616983417]
We propose two complementary methods for contextualized document embeddings.
First, an alternative contrastive learning objective that explicitly incorporates the document neighbors into the intra-batch contextual loss.
Second, a new contextual architecture that explicitly encodes neighbor document information into the encoded representation.
arXiv Detail & Related papers (2024-10-03T14:33:34Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - A Novel Dataset for Non-Destructive Inspection of Handwritten Documents [0.0]
Forensic handwriting examination aims to examine handwritten documents in order to properly define or hypothesize the manuscript's author.
We propose a new and challenging dataset consisting of two subsets: the first consists of 21 documents written either by the classic pen and paper" approach (and later digitized) and directly acquired on common devices such as tablets.
Preliminary results on the proposed datasets show that 90% classification accuracy can be achieved on the first subset.
arXiv Detail & Related papers (2024-01-09T09:25:58Z) - Innovative Methods for Non-Destructive Inspection of Handwritten
Documents [0.0]
We present a framework capable of extracting and analyzing intrinsic measures of manuscript documents using image processing and deep learning techniques.
By quantifying the Euclidean distance between the feature vectors of the documents to be compared, authorship can be discerned.
Experimental results demonstrate the ability of our method to objectively determine authorship in different writing media, outperforming the state of the art.
arXiv Detail & Related papers (2023-10-17T12:45:04Z) - Same or Different? Diff-Vectors for Authorship Analysis [78.83284164605473]
In classic'' authorship analysis a feature vector represents a document, the value of a feature represents (an increasing function of) the relative frequency of the feature in the document, and the class label represents the author of the document.
Our experiments tackle same-author verification, authorship verification, and closed-set authorship attribution; while DVs are naturally geared for solving the 1st, we also provide two novel methods for solving the 2nd and 3rd.
arXiv Detail & Related papers (2023-01-24T08:48:12Z) - Writer Retrieval and Writer Identification in Greek Papyri [4.44566870214758]
Writer identification refers to the classification of known writers while writer retrieval seeks to find the writer by means of image similarity in a dataset of images.
While automatic writer identification/retrieval methods already provide promising results for many historical document types, papyri data is very challenging due to the fiber structures and severe artifacts.
We investigate several methods and show that a good binarization is key to an improved writer identification in papyri writings.
arXiv Detail & Related papers (2022-12-15T08:42:25Z) - PART: Pre-trained Authorship Representation Transformer [64.78260098263489]
Authors writing documents imprint identifying information within their texts: vocabulary, registry, punctuation, misspellings, or even emoji usage.
Previous works use hand-crafted features or classification tasks to train their authorship models, leading to poor performance on out-of-domain authors.
We propose a contrastively trained model fit to learn textbfauthorship embeddings instead of semantics.
arXiv Detail & Related papers (2022-09-30T11:08:39Z) - Augraphy: A Data Augmentation Library for Document Images [59.457999432618614]
Augraphy is a Python library for constructing data augmentation pipelines.
It provides strategies to produce augmented versions of clean document images that appear to have been altered by standard office operations.
arXiv Detail & Related papers (2022-08-30T22:36:19Z) - Letter-level Online Writer Identification [86.13203975836556]
We focus on a novel problem, letter-level online writer-id, which requires only a few trajectories of written letters as identification cues.
A main challenge is that a person often writes a letter in different styles from time to time.
We refer to this problem as the variance of online writing styles (Var-O-Styles)
arXiv Detail & Related papers (2021-12-06T07:21:53Z) - Re-ranking for Writer Identification and Writer Retrieval [8.53463698903858]
We show that a re-ranking step based on k-reciprocal nearest neighbor relationships is advantageous for writer identification.
We use these reciprocal relationships in two ways: encode them into new vectors, as originally proposed, or integrate them in terms of query-expansion.
arXiv Detail & Related papers (2020-07-14T15:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.