Related papers: Counterfactual Explanations for Model Ensembles Using Entropic Risk Measures

Counterfactual Explanations for Model Ensembles Using Entropic Risk Measures

URL: http://arxiv.org/abs/2503.07934v1
Date: Tue, 11 Mar 2025 00:25:28 GMT
Title: Counterfactual Explanations for Model Ensembles Using Entropic Risk Measures
Authors: Erfaun Noorani, Pasan Dissanayake, Faisal Hamman, Sanghamitra Dutta,
Abstract summary: Counterfactual explanations indicate the smallest change in input that can translate to a different outcome for a machine learning model.<n>We propose a novel strategy to find the counterfactual for an ensemble of models using the perspective of entropic risk measure.<n>We study the trade-off between the cost (effort) for the counterfactual and its validity for an ensemble by varying degrees of risk aversion.
Score: 7.959080260803575
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Counterfactual explanations indicate the smallest change in input that can translate to a different outcome for a machine learning model. Counterfactuals have generated immense interest in high-stakes applications such as finance, education, hiring, etc. In several use-cases, the decision-making process often relies on an ensemble of models rather than just one. Despite significant research on counterfactuals for one model, the problem of generating a single counterfactual explanation for an ensemble of models has received limited interest. Each individual model might lead to a different counterfactual, whereas trying to find a counterfactual accepted by all models might significantly increase cost (effort). We propose a novel strategy to find the counterfactual for an ensemble of models using the perspective of entropic risk measure. Entropic risk is a convex risk measure that satisfies several desirable properties. We incorporate our proposed risk measure into a novel constrained optimization to generate counterfactuals for ensembles that stay valid for several models. The main significance of our measure is that it provides a knob that allows for the generation of counterfactuals that stay valid under an adjustable fraction of the models. We also show that a limiting case of our entropic-risk-based strategy yields a counterfactual valid for all models in the ensemble (worst-case min-max approach). We study the trade-off between the cost (effort) for the counterfactual and its validity for an ensemble by varying degrees of risk aversion, as determined by our risk parameter knob. We validate our performance on real-world datasets.

Related papers

Reward Model Interpretability via Optimal and Pessimal Tokens [4.951383975460995]
Reward modeling has emerged as a crucial component in aligning large language models with human values.<n>We present a novel approach to reward model interpretability through exhaustive analysis of their responses across their entire vocabulary space.<n>We find that these models can encode concerning biases toward certain identity groups, which may emerge as unintended consequences of harmlessness training.
arXiv Detail & Related papers (2025-06-08T23:56:58Z)
Optimal Classification under Performative Distribution Shift [13.508249764979075]
We propose a novel view in which performative effects are modelled as push-forward measures. We prove the convexity of the performative risk under a new set of assumptions. We also establish a connection with adversarially robust classification by reformulating the minimization of the performative risk as a min-max variational problem.
arXiv Detail & Related papers (2024-11-04T12:20:13Z)
Kullback-Leibler Barycentre of Stochastic Processes [0.0]
We consider the problem where an agent aims to combine the views and insights of different experts' models. We show existence and uniqueness of the barycentre model and proof an explicit representation of the Radon-Nikodym derivative. Two deep learning algorithms are proposed to find the optimal drift of the combined model.
arXiv Detail & Related papers (2024-07-05T20:45:27Z)
Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory [9.771997770574947]
We analyze how model reconstruction using counterfactuals can be improved. Our main contribution is to derive novel theoretical relationships between the error in model reconstruction and the number of counterfactual queries.
arXiv Detail & Related papers (2024-05-08T18:52:47Z)
On the Embedding Collapse when Scaling up Recommendation Models [53.66285358088788]
We identify the embedding collapse phenomenon as the inhibition of scalability, wherein the embedding matrix tends to occupy a low-dimensional subspace. We propose a simple yet effective multi-embedding design incorporating embedding-set-specific interaction modules to learn embedding sets with large diversity.
arXiv Detail & Related papers (2023-10-06T17:50:38Z)
Endogenous Macrodynamics in Algorithmic Recourse [52.87956177581998]
Existing work on Counterfactual Explanations (CE) and Algorithmic Recourse (AR) has largely focused on single individuals in a static environment. We show that many of the existing methodologies can be collectively described by a generalized framework. We then argue that the existing framework does not account for a hidden external cost of recourse, that only reveals itself when studying the endogenous dynamics of recourse at the group level.
arXiv Detail & Related papers (2023-08-16T07:36:58Z)
Robust Counterfactual Explanations for Neural Networks With Probabilistic Guarantees [11.841312820944774]
We propose a measure -- that we call $textitStability$ -- to quantify the robustness of counterfactuals to potential model changes for differentiable models. Our main contribution is to show that counterfactuals with sufficiently high value of $textitStability$ will remain valid after potential model changes with high probability.
arXiv Detail & Related papers (2023-05-19T20:48:05Z)
Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning [21.931580762349096]
We introduce an algorithm that computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model. We prove an information-theoretic, Bayesian regret bound for our algorithm that holds for any finite-horizon, episodic sequential decision-making problem.
arXiv Detail & Related papers (2022-06-04T23:36:38Z)
Mitigating multiple descents: A model-agnostic framework for risk monotonization [84.6382406922369]
We develop a general framework for risk monotonization based on cross-validation. We propose two data-driven methodologies, namely zero- and one-step, that are akin to bagging and boosting.
arXiv Detail & Related papers (2022-05-25T17:41:40Z)
CC-Cert: A Probabilistic Approach to Certify General Robustness of Neural Networks [58.29502185344086]
In safety-critical machine learning applications, it is crucial to defend models against adversarial attacks. It is important to provide provable guarantees for deep learning models against semantically meaningful input transformations. We propose a new universal probabilistic certification approach based on Chernoff-Cramer bounds.
arXiv Detail & Related papers (2021-09-22T12:46:04Z)
Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction. We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss. Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
BODAME: Bilevel Optimization for Defense Against Model Extraction [10.877450596327407]
We consider an adversarial setting to prevent model extraction under the assumption that will make best guess on the service provider's attacker. We formulate a surrogate model using the predictions of the true model. We give a tractable transformation and an algorithm for more complicated models that are learned by using gradient descent-based algorithms.
arXiv Detail & Related papers (2021-03-11T17:08:31Z)
Characterizing Fairness Over the Set of Good Models Under Selective Labels [69.64662540443162]
We develop a framework for characterizing predictive fairness properties over the set of models that deliver similar overall performance. We provide tractable algorithms to compute the range of attainable group-level predictive disparities. We extend our framework to address the empirically relevant challenge of selectively labelled data.
arXiv Detail & Related papers (2021-01-02T02:11:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.