Related papers: On the Definition and Detection of Cherry-Picking in Counterfactual Explanations

On the Definition and Detection of Cherry-Picking in Counterfactual Explanations

URL: http://arxiv.org/abs/2601.04977v1
Date: Thu, 08 Jan 2026 14:29:24 GMT
Title: On the Definition and Detection of Cherry-Picking in Counterfactual Explanations
Authors: James Hinns, Sofie Goethals, Stephan Van der Veeken, Theodoros Evgeniou, David Martens,
Abstract summary: We formally define cherry-picking for counterfactual explanations.<n>We show that detection is extremely limited in practice.<n>We argue that safeguards should prioritise, standardisation, and procedural constraints over post-hoc detection.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Counterfactual explanations are widely used to communicate how inputs must change for a model to alter its prediction. For a single instance, many valid counterfactuals can exist, which leaves open the possibility for an explanation provider to cherry-pick explanations that better suit a narrative of their choice, highlighting favourable behaviour and withholding examples that reveal problematic behaviour. We formally define cherry-picking for counterfactual explanations in terms of an admissible explanation space, specified by the generation procedure, and a utility function. We then study to what extent an external auditor can detect such manipulation. Considering three levels of access to the explanation process: full procedural access, partial procedural access, and explanation-only access, we show that detection is extremely limited in practice. Even with full procedural access, cherry-picked explanations can remain difficult to distinguish from non cherry-picked explanations, because the multiplicity of valid counterfactuals and flexibility in the explanation specification provide sufficient degrees of freedom to mask deliberate selection. Empirically, we demonstrate that this variability often exceeds the effect of cherry-picking on standard counterfactual quality metrics such as proximity, plausibility, and sparsity, making cherry-picked explanations statistically indistinguishable from baseline explanations. We argue that safeguards should therefore prioritise reproducibility, standardisation, and procedural constraints over post-hoc detection, and we provide recommendations for algorithm developers, explanation providers, and auditors.

Related papers

Explanation Multiplicity in SHAP: Characterization and Assessment [28.413883186555438]
Post-hoc explanations are widely used to justify, contest, and review automated decisions in high-stakes domains such as lending, employment, and healthcare.<n>In practice, however, SHAP explanations can differ substantially across repeated runs, even when the individual, prediction task, and trained model are held fixed.<n>We conceptualize and name this phenomenon explanation multiplicity: the existence of multiple, internally valid but substantively different explanations for the same decision.
arXiv Detail & Related papers (2026-01-19T02:01:18Z)
COMMUNITYNOTES: A Dataset for Exploring the Helpfulness of Fact-Checking Explanations [89.37527535663433]
We present a large-scale dataset of 104k posts with user-provided notes and helpfulness labels.<n>We propose a framework that automatically generates and improves reason definitions via automatic prompt optimization.<n>Our experiments show that the optimized definitions can improve both helpfulness and reason prediction.
arXiv Detail & Related papers (2025-10-28T05:28:47Z)
Are We Merely Justifying Results ex Post Facto? Quantifying Explanatory Inversion in Post-Hoc Model Explanations [87.68633031231924]
Post-hoc explanation methods provide interpretation by attributing predictions to input features.<n>Do these explanations unintentionally reverse the natural relationship between inputs and outputs?<n>We propose Inversion Quantification (IQ), a framework that quantifies the degree to which explanations rely on outputs and deviate from faithful input-output relationships.
arXiv Detail & Related papers (2025-04-11T19:00:12Z)
Auditing Local Explanations is Hard [14.172657936593582]
We investigate an auditing framework in which a third-party auditor or a collective of users attempts to sanity-check explanations. We prove upper and lower bounds on the amount of queries that are needed for an auditor to succeed within this framework. Our results suggest that for complex high-dimensional settings, merely providing a pointwise prediction and explanation could be insufficient.
arXiv Detail & Related papers (2024-07-18T08:34:05Z)
Local Feature Selection without Label or Feature Leakage for Interpretable Machine Learning Predictions [15.811319240038603]
We attribute the problem of misleading selections by formalizing the concepts of label and feature leakage. We propose the first local feature selection method that is proven to have no leakage called SUWR. Our experimental results indicate that SUWR is less prone to overfitting and combines state-of-the-art predictive performance with high feature-selection sparsity.
arXiv Detail & Related papers (2024-07-16T14:36:30Z)
Abductive Commonsense Reasoning Exploiting Mutually Exclusive Explanations [118.0818807474809]
Abductive reasoning aims to find plausible explanations for an event. Existing approaches for abductive reasoning in natural language processing often rely on manually generated annotations for supervision. This work proposes an approach for abductive commonsense reasoning that exploits the fact that only a subset of explanations is correct for a given context.
arXiv Detail & Related papers (2023-05-24T01:35:10Z)
HOP, UNION, GENERATE: Explainable Multi-hop Reasoning without Rationale Supervision [118.0818807474809]
This work proposes a principled, probabilistic approach for training explainable multi-hop QA systems without rationale supervision. Our approach performs multi-hop reasoning by explicitly modeling rationales as sets, enabling the model to capture interactions between documents and sentences within a document.
arXiv Detail & Related papers (2023-05-23T16:53:49Z)
Explanation Selection Using Unlabeled Data for Chain-of-Thought Prompting [80.9896041501715]
Explanations that have not been "tuned" for a task, such as off-the-shelf explanations written by nonexperts, may lead to mediocre performance. This paper tackles the problem of how to optimize explanation-infused prompts in a blackbox fashion.
arXiv Detail & Related papers (2023-02-09T18:02:34Z)
Features of Explainability: How users understand counterfactual and causal explanations for categorical and continuous features in XAI [10.151828072611428]
Counterfactual explanations are increasingly used to address interpretability, recourse, and bias in AI decisions. We tested the effects of counterfactual and causal explanations on the objective accuracy of users predictions. We also found that users understand explanations referring to categorical features more readily than those referring to continuous features.
arXiv Detail & Related papers (2022-04-21T15:01:09Z)
Explainability in Process Outcome Prediction: Guidelines to Obtain Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction. This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z)
Generating Fluent Fact Checking Explanations with Unsupervised Post-Editing [22.5444107755288]
We present an iterative edit-based algorithm that uses only phrase-level edits to perform unsupervised post-editing of ruling comments. We show that our model generates explanations that are fluent, readable, non-redundant, and cover important information for the fact check.
arXiv Detail & Related papers (2021-12-13T15:31:07Z)
A framework for step-wise explaining how to solve constraint satisfaction problems [21.96171133035504]
We study the problem of explaining the inference steps that one can take during propagation, in a way that is easy to interpret for a person. Thereby, we aim to give the constraint solver explainable agency, which can help in building trust in the solver.
arXiv Detail & Related papers (2020-06-11T11:35:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.