Beyond Model Interpretability: On the Faithfulness and Adversarial
Robustness of Contrastive Textual Explanations
- URL: http://arxiv.org/abs/2210.08902v1
- Date: Mon, 17 Oct 2022 09:50:02 GMT
- Title: Beyond Model Interpretability: On the Faithfulness and Adversarial
Robustness of Contrastive Textual Explanations
- Authors: Julia El Zini, and Mariette Awad
- Abstract summary: This work motivates textual counterfactuals by laying the ground for a novel evaluation scheme inspired by the faithfulness of explanations.
Experiments on sentiment analysis data show that the connectedness of counterfactuals to their original counterparts is not obvious in both models.
- Score: 2.543865489517869
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Contrastive explanation methods go beyond transparency and address the
contrastive aspect of explanations. Such explanations are emerging as an
attractive option to provide actionable change to scenarios adversely impacted
by classifiers' decisions. However, their extension to textual data is
under-explored and there is little investigation on their vulnerabilities and
limitations.
This work motivates textual counterfactuals by laying the ground for a novel
evaluation scheme inspired by the faithfulness of explanations. Accordingly, we
extend the computation of three metrics, proximity,connectedness and stability,
to textual data and we benchmark two successful contrastive methods, POLYJUICE
and MiCE, on our suggested metrics. Experiments on sentiment analysis data show
that the connectedness of counterfactuals to their original counterparts is not
obvious in both models. More interestingly, the generated contrastive texts are
more attainable with POLYJUICE which highlights the significance of latent
representations in counterfactual search. Finally, we perform the first
semantic adversarial attack on textual recourse methods. The results
demonstrate the robustness of POLYJUICE and the role that latent input
representations play in robustness and reliability.
Related papers
- Does Faithfulness Conflict with Plausibility? An Empirical Study in Explainable AI across NLP Tasks [9.979726030996051]
We show that Shapley value and LIME could attain greater faithfulness and plausibility.
Our findings suggest that rather than optimizing for one dimension at the expense of the other, we could seek to optimize explainability algorithms with dual objectives.
arXiv Detail & Related papers (2024-03-29T20:28:42Z) - How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored.
Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges.
We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z) - Natural Language Decompositions of Implicit Content Enable Better Text
Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.
We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.
Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z) - Interpretable Automatic Fine-grained Inconsistency Detection in Text
Summarization [56.94741578760294]
We propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary.
Motivated by how humans inspect factual inconsistency in summaries, we propose an interpretable fine-grained inconsistency detection model, FineGrainFact.
arXiv Detail & Related papers (2023-05-23T22:11:47Z) - Counterfactual Debiasing for Generating Factually Consistent Text
Summaries [46.88138136539263]
We construct causal graphs for abstractive text summarization and identify the intrinsic causes of the factual inconsistency.
We propose a debiasing framework, named CoFactSum, to alleviate the causal effects of these biases by counterfactual estimation.
Experiments on two widely-used summarization datasets demonstrate the effectiveness of CoFactSum.
arXiv Detail & Related papers (2023-05-18T06:15:45Z) - Interpretable multimodal sentiment analysis based on textual modality
descriptions by using large-scale language models [1.4213973379473654]
Multimodal sentiment analysis is an important area for understanding the user's internal states.
Previous works have attempted to use attention weights or vector distributions to provide interpretability.
This study proposed a novel approach to provide interpretability by converting nonverbal modalities into text descriptions.
arXiv Detail & Related papers (2023-05-07T06:48:06Z) - Context-faithful Prompting for Large Language Models [51.194410884263135]
Large language models (LLMs) encode parametric knowledge about world facts.
Their reliance on parametric knowledge may cause them to overlook contextual cues, leading to incorrect predictions in context-sensitive NLP tasks.
We assess and enhance LLMs' contextual faithfulness in two aspects: knowledge conflict and prediction with abstention.
arXiv Detail & Related papers (2023-03-20T17:54:58Z) - Estimating the Adversarial Robustness of Attributions in Text with
Transformers [44.745873282080346]
We establish a novel definition of attribution robustness (AR) in text classification, based on Lipschitz continuity.
We then propose our novel TransformerExplanationAttack (TEA), a strong adversary that provides a tight estimation for attribution in text classification.
arXiv Detail & Related papers (2022-12-18T20:18:59Z) - Counterfactual Reasoning for Out-of-distribution Multimodal Sentiment
Analysis [56.84237932819403]
This paper aims to estimate and mitigate the bad effect of textual modality for strong OOD generalization.
Inspired by this, we devise a model-agnostic counterfactual framework for multimodal sentiment analysis.
arXiv Detail & Related papers (2022-07-24T03:57:40Z) - Conditional Supervised Contrastive Learning for Fair Text Classification [59.813422435604025]
We study learning fair representations that satisfy a notion of fairness known as equalized odds for text classification via contrastive learning.
Specifically, we first theoretically analyze the connections between learning representations with a fairness constraint and conditional supervised contrastive objectives.
arXiv Detail & Related papers (2022-05-23T17:38:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.