Residualized Similarity for Faithfully Explainable Authorship Verification
- URL: http://arxiv.org/abs/2510.05362v1
- Date: Mon, 06 Oct 2025 20:41:17 GMT
- Title: Residualized Similarity for Faithfully Explainable Authorship Verification
- Authors: Peter Zeng, Pegah Alipoormolabashi, Jihu Mun, Gourab Dey, Nikita Soni, Niranjan Balasubramanian, Owen Rambow, H. Schwartz,
- Abstract summary: Residualized Similarity (RS) is a novel method that supplements systems using interpretable features with a neural network to improve their performance.<n>Our evaluation across four datasets shows that not only can we match the performance of state-of-the-art authorship verification models, but we can show how and to what degree the final prediction is faithful and interpretable.
- Score: 18.7598434282115
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Responsible use of Authorship Verification (AV) systems not only requires high accuracy but also interpretable solutions. More importantly, for systems to be used to make decisions with real-world consequences requires the model's prediction to be explainable using interpretable features that can be traced to the original texts. Neural methods achieve high accuracies, but their representations lack direct interpretability. Furthermore, LLM predictions cannot be explained faithfully -- if there is an explanation given for a prediction, it doesn't represent the reasoning process behind the model's prediction. In this paper, we introduce Residualized Similarity (RS), a novel method that supplements systems using interpretable features with a neural network to improve their performance while maintaining interpretability. Authorship verification is fundamentally a similarity task, where the goal is to measure how alike two documents are. The key idea is to use the neural network to predict a similarity residual, i.e. the error in the similarity predicted by the interpretable system. Our evaluation across four datasets shows that not only can we match the performance of state-of-the-art authorship verification models, but we can show how and to what degree the final prediction is faithful and interpretable.
Related papers
- Enhancing Pre-trained Representation Classifiability can Boost its Interpretability [112.296393156262]
We quantify the representation interpretability by leveraging its correlation with the ratio of interpretable semantics within the representations.<n>We propose the Inherent Interpretability Score (IIS) that evaluates the information loss, measures the ratio of interpretable semantics, and quantifies the representation interpretability.
arXiv Detail & Related papers (2025-10-28T06:21:06Z) - I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z) - Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - VeriFlow: Modeling Distributions for Neural Network Verification [3.510536859655114]
Formal verification has emerged as a promising method to ensure the safety and reliability of neural networks.<n>We propose the VeriFlow architecture as a flow-based density model tailored to allow any verification approach to restrict its search to some data distribution of interest.
arXiv Detail & Related papers (2024-06-20T12:41:39Z) - Uncertainty-Aware Explanations Through Probabilistic Self-Explainable Neural Networks [17.238290206236027]
Prob-PSENNs replace point estimates for prototypes with probability distributions over their values.<n>Prob-PSENNs provide more meaningful and robust explanations than their non-probabilistic counterparts.
arXiv Detail & Related papers (2024-03-20T16:47:28Z) - Intrinsic User-Centric Interpretability through Global Mixture of Experts [31.738009841932374]
InterpretCC is a family of intrinsically interpretable neural networks that optimize for ease of human understanding and explanation faithfulness.<n>We show that InterpretCC explanations are found to have higher actionability and usefulness over other intrinsically interpretable approaches.
arXiv Detail & Related papers (2024-02-05T11:55:50Z) - Prototype-based Aleatoric Uncertainty Quantification for Cross-modal
Retrieval [139.21955930418815]
Cross-modal Retrieval methods build similarity relations between vision and language modalities by jointly learning a common representation space.
However, the predictions are often unreliable due to the Aleatoric uncertainty, which is induced by low-quality data, e.g., corrupt images, fast-paced videos, and non-detailed texts.
We propose a novel Prototype-based Aleatoric Uncertainty Quantification (PAU) framework to provide trustworthy predictions by quantifying the uncertainty arisen from the inherent data ambiguity.
arXiv Detail & Related papers (2023-09-29T09:41:19Z) - Claim-Dissector: An Interpretable Fact-Checking System with Joint
Re-ranking and Veracity Prediction [4.082750656756811]
We present Claim-Dissector: a novel latent variable model for fact-checking and analysis.
We disentangle the per-evidence relevance probability and its contribution to the final veracity probability in an interpretable way.
Despite its interpretable nature, our system results competitive with state-of-the-art on the FEVER dataset.
arXiv Detail & Related papers (2022-07-28T14:30:06Z) - Logical Satisfiability of Counterfactuals for Faithful Explanations in
NLI [60.142926537264714]
We introduce the methodology of Faithfulness-through-Counterfactuals.
It generates a counterfactual hypothesis based on the logical predicates expressed in the explanation.
It then evaluates if the model's prediction on the counterfactual is consistent with that expressed logic.
arXiv Detail & Related papers (2022-05-25T03:40:59Z) - Deconfounding to Explanation Evaluation in Graph Neural Networks [136.73451468551656]
We argue that a distribution shift exists between the full graph and the subgraph, causing the out-of-distribution problem.
We propose Deconfounded Subgraph Evaluation (DSE) which assesses the causal effect of an explanatory subgraph on the model prediction.
arXiv Detail & Related papers (2022-01-21T18:05:00Z) - Toward Scalable and Unified Example-based Explanation and Outlier
Detection [128.23117182137418]
We argue for a broader adoption of prototype-based student networks capable of providing an example-based explanation for their prediction.
We show that our prototype-based networks beyond similarity kernels deliver meaningful explanations and promising outlier detection results without compromising classification accuracy.
arXiv Detail & Related papers (2020-11-11T05:58:17Z) - DoLFIn: Distributions over Latent Features for Interpretability [8.807587076209568]
We propose a novel strategy for achieving interpretability in neural network models.
Our approach builds on the success of using probability as the central quantity.
We show that DoLFIn not only provides interpretable solutions, but even slightly outperforms the classical CNN and BiLSTM text classification.
arXiv Detail & Related papers (2020-11-10T18:32:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.