EXAGREE: Towards Explanation Agreement in Explainable Machine Learning
- URL: http://arxiv.org/abs/2411.01956v1
- Date: Mon, 04 Nov 2024 10:28:38 GMT
- Title: EXAGREE: Towards Explanation Agreement in Explainable Machine Learning
- Authors: Sichao Li, Quanling Deng, Amanda S. Barnard,
- Abstract summary: Explanations in machine learning are critical for trust, transparency, and fairness.
We introduce a novel framework, EXplanation AGREEment, to bridge diverse interpretations in explainable machine learning.
- Score: 0.0
- License:
- Abstract: Explanations in machine learning are critical for trust, transparency, and fairness. Yet, complex disagreements among these explanations limit the reliability and applicability of machine learning models, especially in high-stakes environments. We formalize four fundamental ranking-based explanation disagreement problems and introduce a novel framework, EXplanation AGREEment (EXAGREE), to bridge diverse interpretations in explainable machine learning, particularly from stakeholder-centered perspectives. Our approach leverages a Rashomon set for attribution predictions and then optimizes within this set to identify Stakeholder-Aligned Explanation Models (SAEMs) that minimize disagreement with diverse stakeholder needs while maintaining predictive performance. Rigorous empirical analysis on synthetic and real-world datasets demonstrates that EXAGREE reduces explanation disagreement and improves fairness across subgroups in various domains. EXAGREE not only provides researchers with a new direction for studying explanation disagreement problems but also offers data scientists a tool for making better-informed decisions in practical applications.
Related papers
- Interpreting Inflammation Prediction Model via Tag-based Cohort Explanation [5.356481722174994]
We propose a novel framework for identifying cohorts within a dataset based on local feature importance scores.
We evaluate our framework on a food-based inflammation prediction model and demonstrated that the framework can generate reliable explanations that match domain knowledge.
arXiv Detail & Related papers (2024-10-17T23:22:59Z) - CohEx: A Generalized Framework for Cohort Explanation [5.269665407562217]
Cohort explanations offer insights into the explainee's behavior on a specific group or cohort of instances.
In this paper, we discuss the unique challenges and opportunities associated with measuring cohort explanations.
arXiv Detail & Related papers (2024-10-17T03:36:18Z) - Self-Distilled Disentangled Learning for Counterfactual Prediction [49.84163147971955]
We propose the Self-Distilled Disentanglement framework, known as $SD2$.
Grounded in information theory, it ensures theoretically sound independent disentangled representations without intricate mutual information estimator designs.
Our experiments, conducted on both synthetic and real-world datasets, confirm the effectiveness of our approach.
arXiv Detail & Related papers (2024-06-09T16:58:19Z) - Generalizing Machine Learning Evaluation through the Integration of Shannon Entropy and Rough Set Theory [0.0]
We introduce a comprehensive framework that synergizes the granularity of rough set theory with the uncertainty quantification of Shannon entropy.
Our methodology is rigorously tested on various datasets, showcasing its capability to not only assess predictive performance but also to illuminate the underlying data complexity and model robustness.
arXiv Detail & Related papers (2024-04-18T21:22:42Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - On the Joint Interaction of Models, Data, and Features [82.60073661644435]
We introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features.
Based on these observations, we propose a conceptual framework for feature learning.
Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form.
arXiv Detail & Related papers (2023-06-07T21:35:26Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - RELAX: Representation Learning Explainability [10.831313203043514]
We propose RELAX, which is the first approach for attribution-based explanations of representations.
ReLAX explains representations by measuring similarities in the representation space between an input and masked out versions of itself.
We provide theoretical interpretations of RELAX and conduct a novel analysis of feature extractors trained using supervised and unsupervised learning.
arXiv Detail & Related papers (2021-12-19T14:51:31Z) - Multilingual Multi-Aspect Explainability Analyses on Machine Reading Comprehension Models [76.48370548802464]
This paper focuses on conducting a series of analytical experiments to examine the relations between the multi-head self-attention and the final MRC system performance.
We discover that passage-to-question and passage understanding attentions are the most important ones in the question answering process.
Through comprehensive visualizations and case studies, we also observe several general findings on the attention maps, which can be helpful to understand how these models solve the questions.
arXiv Detail & Related papers (2021-08-26T04:23:57Z) - Beyond Trivial Counterfactual Explanations with Diverse Valuable
Explanations [64.85696493596821]
In computer vision applications, generative counterfactual methods indicate how to perturb a model's input to change its prediction.
We propose a counterfactual method that learns a perturbation in a disentangled latent space that is constrained using a diversity-enforcing loss.
Our model improves the success rate of producing high-quality valuable explanations when compared to previous state-of-the-art methods.
arXiv Detail & Related papers (2021-03-18T12:57:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.