ExpProof : Operationalizing Explanations for Confidential Models with ZKPs
- URL: http://arxiv.org/abs/2502.03773v1
- Date: Thu, 06 Feb 2025 04:24:29 GMT
- Title: ExpProof : Operationalizing Explanations for Confidential Models with ZKPs
- Authors: Chhavi Yadav, Evan Monroe Laufer, Dan Boneh, Kamalika Chaudhuri,
- Abstract summary: We take a step towards operationalizing explanations in adversarial scenarios with Zero-Knowledge Proofs (ZKPs)
Specifically we explore ZKP-amenable versions of the popular explainability algorithm LIME and evaluate their performance on Neural Networks and Random Forests.
- Score: 33.47144717983562
- License:
- Abstract: In principle, explanations are intended as a way to increase trust in machine learning models and are often obligated by regulations. However, many circumstances where these are demanded are adversarial in nature, meaning the involved parties have misaligned interests and are incentivized to manipulate explanations for their purpose. As a result, explainability methods fail to be operational in such settings despite the demand \cite{bordt2022post}. In this paper, we take a step towards operationalizing explanations in adversarial scenarios with Zero-Knowledge Proofs (ZKPs), a cryptographic primitive. Specifically we explore ZKP-amenable versions of the popular explainability algorithm LIME and evaluate their performance on Neural Networks and Random Forests.
Related papers
- Counterfactual Explanations as Plans [6.445239204595516]
We look to provide a formal account of counterfactual explanations," based in terms of action sequences.
We then show that this naturally leads to an account of model reconciliation, which might take the form of the user correcting the agent's model, or suggesting actions to the agent's plan.
arXiv Detail & Related papers (2025-02-13T11:45:54Z) - Uncertainty Quantification for Gradient-based Explanations in Neural Networks [6.9060054915724]
We propose a pipeline to ascertain the explanation uncertainty of neural networks.
We use this pipeline to produce explanation distributions for the CIFAR-10, FER+, and California Housing datasets.
We compute modified pixel insertion/deletion metrics to evaluate the quality of the generated explanations.
arXiv Detail & Related papers (2024-03-25T21:56:02Z) - LaPLACE: Probabilistic Local Model-Agnostic Causal Explanations [1.0370398945228227]
We introduce LaPLACE-explainer, designed to provide probabilistic cause-and-effect explanations for machine learning models.
The LaPLACE-Explainer component leverages the concept of a Markov blanket to establish statistical boundaries between relevant and non-relevant features.
Our approach offers causal explanations and outperforms LIME and SHAP in terms of local accuracy and consistency of explained features.
arXiv Detail & Related papers (2023-10-01T04:09:59Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Invariant Causal Set Covering Machines [64.86459157191346]
Rule-based models, such as decision trees, appeal to practitioners due to their interpretable nature.
However, the learning algorithms that produce such models are often vulnerable to spurious associations and thus, they are not guaranteed to extract causally-relevant insights.
We propose Invariant Causal Set Covering Machines, an extension of the classical Set Covering Machine algorithm for conjunctions/disjunctions of binary-valued rules that provably avoids spurious associations.
arXiv Detail & Related papers (2023-06-07T20:52:01Z) - Abductive Commonsense Reasoning Exploiting Mutually Exclusive
Explanations [118.0818807474809]
Abductive reasoning aims to find plausible explanations for an event.
Existing approaches for abductive reasoning in natural language processing often rely on manually generated annotations for supervision.
This work proposes an approach for abductive commonsense reasoning that exploits the fact that only a subset of explanations is correct for a given context.
arXiv Detail & Related papers (2023-05-24T01:35:10Z) - Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model.
A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations.
We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z) - Explainability in Process Outcome Prediction: Guidelines to Obtain
Interpretable and Faithful Models [77.34726150561087]
We define explainability through the interpretability of the explanations and the faithfulness of the explainability model in the field of process outcome prediction.
This paper contributes a set of guidelines named X-MOP which allows selecting the appropriate model based on the event log specifications.
arXiv Detail & Related papers (2022-03-30T05:59:50Z) - Feature Attributions and Counterfactual Explanations Can Be Manipulated [32.579094387004346]
We show how adversaries can design biased models that manipulate model agnostic feature attribution methods.
These vulnerabilities allow an adversary to deploy a biased model, yet explanations will not reveal this bias, thereby deceiving stakeholders into trusting the model.
We evaluate the manipulations on real world data sets, including COMPAS and Communities & Crime, and find explanations can be manipulated in practice.
arXiv Detail & Related papers (2021-06-23T17:43:31Z) - Convex Density Constraints for Computing Plausible Counterfactual
Explanations [8.132423340684568]
Counterfactual explanations are considered as one of the most popular techniques to explain a specific decision of a model.
We build upon recent work and propose and study a formal definition of plausible counterfactual explanations.
In particular, we investigate how to use density estimators for enforcing plausibility and feasibility of counterfactual explanations.
arXiv Detail & Related papers (2020-02-12T09:23:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.