Related papers: Beyond Model Interpretability: On the Faithfulness and Adversarial Robustness of Contrastive Textual Explanations

Beyond Model Interpretability: On the Faithfulness and Adversarial Robustness of Contrastive Textual Explanations

URL: http://arxiv.org/abs/2210.08902v1
Date: Mon, 17 Oct 2022 09:50:02 GMT
Title: Beyond Model Interpretability: On the Faithfulness and Adversarial Robustness of Contrastive Textual Explanations
Authors: Julia El Zini, and Mariette Awad
Abstract summary: This work motivates textual counterfactuals by laying the ground for a novel evaluation scheme inspired by the faithfulness of explanations. Experiments on sentiment analysis data show that the connectedness of counterfactuals to their original counterparts is not obvious in both models.
Score: 2.543865489517869
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Contrastive explanation methods go beyond transparency and address the contrastive aspect of explanations. Such explanations are emerging as an attractive option to provide actionable change to scenarios adversely impacted by classifiers' decisions. However, their extension to textual data is under-explored and there is little investigation on their vulnerabilities and limitations. This work motivates textual counterfactuals by laying the ground for a novel evaluation scheme inspired by the faithfulness of explanations. Accordingly, we extend the computation of three metrics, proximity,connectedness and stability, to textual data and we benchmark two successful contrastive methods, POLYJUICE and MiCE, on our suggested metrics. Experiments on sentiment analysis data show that the connectedness of counterfactuals to their original counterparts is not obvious in both models. More interestingly, the generated contrastive texts are more attainable with POLYJUICE which highlights the significance of latent representations in counterfactual search. Finally, we perform the first semantic adversarial attack on textual recourse methods. The results demonstrate the robustness of POLYJUICE and the role that latent input representations play in robustness and reliability.

Related papers

Multimodal Fact Checking with Unified Visual, Textual, and Contextual Representations [2.139909491081949]
We propose a unified framework for fine-grained multimodal fact verification called "MultiCheck"<n>Our architecture combines dedicated encoders for text and images with a fusion module that captures cross-modal relationships using element-wise interactions.<n>We evaluate our approach on the Factify 2 dataset, achieving a weighted F1 score of 0.84, substantially outperforming the baseline.
arXiv Detail & Related papers (2025-08-07T07:36:53Z)
CL-ISR: A Contrastive Learning and Implicit Stance Reasoning Framework for Misleading Text Detection on Social Media [0.5999777817331317]
This paper proposes a novel framework - CL-ISR (Contrastive Learning and Implicit Stance Reasoning) to improve the detection accuracy of misleading texts on social media.<n>First, we use the contrastive learning algorithm to improve the model's learning ability of semantic differences between truthful and misleading texts.<n>Second, we introduce the implicit stance reasoning module, to explore the potential stance tendencies in the text and their relationships with related topics.
arXiv Detail & Related papers (2025-06-05T14:52:28Z)
Illusions of Relevance: Using Content Injection Attacks to Deceive Retrievers, Rerankers, and LLM Judges [52.96987928118327]
We find that embedding models for retrieval, rerankers, and large language model (LLM) relevance judges are vulnerable to content injection attacks. We identify two primary threats: (1) inserting unrelated or harmful content within passages that still appear deceptively "relevant", and (2) inserting entire queries or key query terms into passages to boost their perceived relevance. Our study systematically examines the factors that influence an attack's success, such as the placement of injected content and the balance between relevant and non-relevant material.
arXiv Detail & Related papers (2025-01-30T18:02:15Z)
Does Faithfulness Conflict with Plausibility? An Empirical Study in Explainable AI across NLP Tasks [9.979726030996051]
We show that Shapley value and LIME could attain greater faithfulness and plausibility. Our findings suggest that rather than optimizing for one dimension at the expense of the other, we could seek to optimize explainability algorithms with dual objectives.
arXiv Detail & Related papers (2024-03-29T20:28:42Z)
How Well Do Text Embedding Models Understand Syntax? [50.440590035493074]
The ability of text embedding models to generalize across a wide range of syntactic contexts remains under-explored. Our findings reveal that existing text embedding models have not sufficiently addressed these syntactic understanding challenges. We propose strategies to augment the generalization ability of text embedding models in diverse syntactic scenarios.
arXiv Detail & Related papers (2023-11-14T08:51:00Z)
Natural Language Decompositions of Implicit Content Enable Better Text Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account. We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed. Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z)
Interpretable Automatic Fine-grained Inconsistency Detection in Text Summarization [56.94741578760294]
We propose the task of fine-grained inconsistency detection, the goal of which is to predict the fine-grained types of factual errors in a summary. Motivated by how humans inspect factual inconsistency in summaries, we propose an interpretable fine-grained inconsistency detection model, FineGrainFact.
arXiv Detail & Related papers (2023-05-23T22:11:47Z)
Counterfactual Debiasing for Generating Factually Consistent Text Summaries [46.88138136539263]
We construct causal graphs for abstractive text summarization and identify the intrinsic causes of the factual inconsistency. We propose a debiasing framework, named CoFactSum, to alleviate the causal effects of these biases by counterfactual estimation. Experiments on two widely-used summarization datasets demonstrate the effectiveness of CoFactSum.
arXiv Detail & Related papers (2023-05-18T06:15:45Z)
Interpretable multimodal sentiment analysis based on textual modality descriptions by using large-scale language models [1.4213973379473654]
Multimodal sentiment analysis is an important area for understanding the user's internal states. Previous works have attempted to use attention weights or vector distributions to provide interpretability. This study proposed a novel approach to provide interpretability by converting nonverbal modalities into text descriptions.
arXiv Detail & Related papers (2023-05-07T06:48:06Z)
Context-faithful Prompting for Large Language Models [51.194410884263135]
Large language models (LLMs) encode parametric knowledge about world facts. Their reliance on parametric knowledge may cause them to overlook contextual cues, leading to incorrect predictions in context-sensitive NLP tasks. We assess and enhance LLMs' contextual faithfulness in two aspects: knowledge conflict and prediction with abstention.
arXiv Detail & Related papers (2023-03-20T17:54:58Z)
Estimating the Adversarial Robustness of Attributions in Text with Transformers [44.745873282080346]
We establish a novel definition of attribution robustness (AR) in text classification, based on Lipschitz continuity. We then propose our novel TransformerExplanationAttack (TEA), a strong adversary that provides a tight estimation for attribution in text classification.
arXiv Detail & Related papers (2022-12-18T20:18:59Z)
Counterfactual Reasoning for Out-of-distribution Multimodal Sentiment Analysis [56.84237932819403]
This paper aims to estimate and mitigate the bad effect of textual modality for strong OOD generalization. Inspired by this, we devise a model-agnostic counterfactual framework for multimodal sentiment analysis.
arXiv Detail & Related papers (2022-07-24T03:57:40Z)
Conditional Supervised Contrastive Learning for Fair Text Classification [59.813422435604025]
We study learning fair representations that satisfy a notion of fairness known as equalized odds for text classification via contrastive learning. Specifically, we first theoretically analyze the connections between learning representations with a fairness constraint and conditional supervised contrastive objectives.
arXiv Detail & Related papers (2022-05-23T17:38:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.