Aligning Faithful Interpretations with their Social Attribution
- URL: http://arxiv.org/abs/2006.01067v3
- Date: Thu, 14 Jan 2021 18:54:01 GMT
- Title: Aligning Faithful Interpretations with their Social Attribution
- Authors: Alon Jacovi, Yoav Goldberg
- Abstract summary: We find that the requirement of model interpretations to be faithful is vague and incomplete.
We identify that the problem is a misalignment between the causal chain of decisions (causal attribution) and the attribution of human behavior to the interpretation (social attribution)
- Score: 58.13152510843004
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We find that the requirement of model interpretations to be faithful is vague
and incomplete. With interpretation by textual highlights as a case-study, we
present several failure cases. Borrowing concepts from social science, we
identify that the problem is a misalignment between the causal chain of
decisions (causal attribution) and the attribution of human behavior to the
interpretation (social attribution). We re-formulate faithfulness as an
accurate attribution of causality to the model, and introduce the concept of
aligned faithfulness: faithful causal chains that are aligned with their
expected social behavior. The two steps of causal attribution and social
attribution together complete the process of explaining behavior. With this
formalization, we characterize various failures of misaligned faithful
highlight interpretations, and propose an alternative causal chain to remedy
the issues. Finally, we implement highlight explanations of the proposed causal
format using contrastive explanations.
Related papers
- Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models [29.67884478799914]
Large Language Models (LLMs) are capable of generating persuasive Natural Language Explanations (NLEs) to justify their answers.
Recent studies have proposed various methods to measure the faithfulness of NLEs, typically by inserting perturbations at the explanation or feature level.
We argue that these approaches are neither comprehensive nor correctly designed according to the established definition of faithfulness.
arXiv Detail & Related papers (2024-10-18T03:45:42Z) - The Odyssey of Commonsense Causality: From Foundational Benchmarks to Cutting-Edge Reasoning [70.16523526957162]
Understanding commonsense causality helps people understand the principles of the real world better.
Despite its significance, a systematic exploration of this topic is notably lacking.
Our work aims to provide a systematic overview, update scholars on recent advancements, and provide a pragmatic guide for beginners.
arXiv Detail & Related papers (2024-06-27T16:30:50Z) - Interpretability is in the Mind of the Beholder: A Causal Framework for
Human-interpretable Representation Learning [22.201878275784246]
Focus in Explainable AI is shifting from explanations defined in terms of low-level elements, such as input features, to explanations encoded in terms of interpretable concepts learned from data.
How to reliably acquire such concepts is, however, still fundamentally unclear.
We propose a mathematical framework for acquiring interpretable representations suitable for both post-hoc explainers and concept-based neural networks.
arXiv Detail & Related papers (2023-09-14T14:26:20Z) - Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs [60.244412212130264]
Causal-Consistency Chain-of-Thought harnesses multi-agent collaboration to bolster the faithfulness and causality of foundation models.
Our framework demonstrates significant superiority over state-of-the-art methods through extensive and comprehensive evaluations.
arXiv Detail & Related papers (2023-08-23T04:59:21Z) - Causal Explanations and XAI [8.909115457491522]
An important goal of Explainable Artificial Intelligence (XAI) is to compensate for mismatches by offering explanations.
I take a step further by formally defining the causal notions of sufficient explanations and counterfactual explanations.
I also touch upon the significance of this work for fairness in AI by showing how actual causation can be used to improve the idea of path-specific counterfactual fairness.
arXiv Detail & Related papers (2022-01-31T12:32:10Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - Did they answer? Subjective acts and intents in conversational discourse [48.63528550837949]
We present the first discourse dataset with multiple and subjective interpretations of English conversation.
We show disagreements are nuanced and require a deeper understanding of the different contextual factors.
arXiv Detail & Related papers (2021-04-09T16:34:19Z) - Fairness and Robustness of Contrasting Explanations [9.104557591459283]
We study individual fairness and robustness of contrasting explanations.
We propose to use plausible counterfactuals instead of closest counterfactuals for improving the individual fairness of counterfactual explanations.
arXiv Detail & Related papers (2021-03-03T12:16:06Z) - The Counterfactual NESS Definition of Causation [3.198144010381572]
I show that our definition is in fact a formalization of Wright's famous NESS definition of causation combined with a counterfactual difference-making condition.
I modify our definition to offer a substantial improvement: I weaken the difference-making condition in such a way that it avoids the problematic analysis of cases of preemption.
arXiv Detail & Related papers (2020-12-09T15:57:56Z) - Thinking About Causation: A Causal Language with Epistemic Operators [58.720142291102135]
We extend the notion of a causal model with a representation of the state of an agent.
On the side of the object language, we add operators to express knowledge and the act of observing new information.
We provide a sound and complete axiomatization of the logic, and discuss the relation of this framework to causal team semantics.
arXiv Detail & Related papers (2020-10-30T12:16:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.