A Step Towards Interpretable Authorship Verification
- URL: http://arxiv.org/abs/2006.12418v2
- Date: Tue, 7 Jul 2020 23:30:14 GMT
- Title: A Step Towards Interpretable Authorship Verification
- Authors: Oren Halvani, Lukas Graner, Roey Regev
- Abstract summary: Authorship verification (AV) is a research branch in the field of digital text forensics.
Many approaches make use of features that are related to or influenced by the topic of the documents.
We propose an alternative AV approach that considers only topic-agnostic features in its classification decision.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A central problem that has been researched for many years in the field of
digital text forensics is the question whether two documents were written by
the same author. Authorship verification (AV) is a research branch in this
field that deals with this question. Over the years, research activities in the
context of AV have steadily increased, which has led to a variety of approaches
trying to solve this problem. Many of these approaches, however, make use of
features that are related to or influenced by the topic of the documents.
Therefore, it may accidentally happen that their verification results are based
not on the writing style (the actual focus of AV), but on the topic of the
documents. To address this problem, we propose an alternative AV approach that
considers only topic-agnostic features in its classification decision. In
addition, we present a post-hoc interpretation method that allows to understand
which particular features have contributed to the prediction of the proposed AV
method. To evaluate the performance of our AV method, we compared it with ten
competing baselines (including the current state of the art) on four
challenging data sets. The results show that our approach outperforms all
baselines in two cases (with a maximum accuracy of 84%), while in the other two
cases it performs as well as the strongest baseline.
Related papers
- Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges [52.96987928118327]
We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks.
We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance.
Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
arXiv Detail & Related papers (2025-01-30T18:02:15Z) - The \textit{Questio de aqua et terra}: A Computational Authorship Verification Study [49.56191463229252]
This study investigates the authenticity of the Questio via computational authorship verification (AV)
We build a family of AV systems and assemble a corpus of 330 13th- and 14th-century Latin texts.
The application of the AV system to the Questio returns a highly confident prediction concerning its authenticity.
arXiv Detail & Related papers (2025-01-07T18:42:05Z) - Document Set Expansion with Positive-Unlabeled Learning: A Density
Estimation-based Approach [18.923476312831394]
Document set expansion aims to identify relevant documents from a large collection based on a small set of documents that are on a fine-grained topic.
Previous work shows that PU learning is a promising method for this task.
We propose a novel PU learning framework based on density estimation, called puDE, that can handle the above issues.
arXiv Detail & Related papers (2024-01-20T06:52:14Z) - Analysing the Resourcefulness of the Paragraph for Precedence Retrieval [0.1761604268733064]
We analyzed the resourcefulness of paragraph-level information in capturing similarity among judgments for improving the performance of precedence retrieval.
We found that the paragraph-level methods could capture the similarity among the judgments with only a few paragraph interactions and exhibit more discriminating power over the baseline document-level method.
arXiv Detail & Related papers (2023-07-29T08:55:38Z) - Same or Different? Diff-Vectors for Authorship Analysis [78.83284164605473]
In classic'' authorship analysis a feature vector represents a document, the value of a feature represents (an increasing function of) the relative frequency of the feature in the document, and the class label represents the author of the document.
Our experiments tackle same-author verification, authorship verification, and closed-set authorship attribution; while DVs are naturally geared for solving the 1st, we also provide two novel methods for solving the 2nd and 3rd.
arXiv Detail & Related papers (2023-01-24T08:48:12Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Joint Answering and Explanation for Visual Commonsense Reasoning [46.44588492897933]
Visual Commonsense Reasoning endeavors to pursue a more high-level visual comprehension.
It is composed of two indispensable processes: question answering over a given image and rationale inference for answer explanation.
We present a plug-and-play knowledge distillation enhanced framework to couple the question answering and rationale inference processes.
arXiv Detail & Related papers (2022-02-25T11:26:52Z) - A Simple Information-Based Approach to Unsupervised Domain-Adaptive
Aspect-Based Sentiment Analysis [58.124424775536326]
We propose a simple but effective technique based on mutual information to extract their term.
Experiment results show that our proposed method outperforms the state-of-the-art methods for cross-domain ABSA by 4.32% Micro-F1.
arXiv Detail & Related papers (2022-01-29T10:18:07Z) - POSNoise: An Effective Countermeasure Against Topic Biases in Authorship
Analysis [0.0]
Authorship verification is a fundamental research task in digital text forensics.
We propose a preprocessing technique called POSNoise, which effectively masks topic-related content in a given text.
Our evaluation shows that POSNoise leads to better results compared to a well-known topic masking approach in 34 out of 42 cases, with an increase in accuracy of up to 10%.
arXiv Detail & Related papers (2020-05-02T21:10:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.