Flexible text generation for counterfactual fairness probing
- URL: http://arxiv.org/abs/2206.13757v1
- Date: Tue, 28 Jun 2022 05:07:20 GMT
- Title: Flexible text generation for counterfactual fairness probing
- Authors: Zee Fryer, Vera Axelrod, Ben Packer, Alex Beutel, Jilin Chen, Kellie
Webster
- Abstract summary: A common approach for testing fairness issues in text-based classifiers is through the use of counterfactuals.
Existing counterfactual generation methods rely on wordlists or templates, producing simple counterfactuals that don't take into account grammar, context, or subtle sensitive attribute references.
In this paper, we introduce a task for generating counterfactuals that overcomes these shortcomings, and demonstrate how large language models (LLMs) can be leveraged to make progress on this task.
- Score: 8.262741696221143
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A common approach for testing fairness issues in text-based classifiers is
through the use of counterfactuals: does the classifier output change if a
sensitive attribute in the input is changed? Existing counterfactual generation
methods typically rely on wordlists or templates, producing simple
counterfactuals that don't take into account grammar, context, or subtle
sensitive attribute references, and could miss issues that the wordlist
creators had not considered. In this paper, we introduce a task for generating
counterfactuals that overcomes these shortcomings, and demonstrate how large
language models (LLMs) can be leveraged to make progress on this task. We show
that this LLM-based method can produce complex counterfactuals that existing
methods cannot, comparing the performance of various counterfactual generation
methods on the Civil Comments dataset and showing their value in evaluating a
toxicity classifier.
Related papers
- A Comparative Analysis of Counterfactual Explanation Methods for Text Classifiers [0.0]
We evaluate five methods for generating counterfactual explanations for a BERT text classifier.
established white-box substitution-based methods are effective at generating valid counterfactuals that change the classifier's output.
newer methods based on large language models (LLMs) excel at producing natural and linguistically plausible text counterfactuals.
arXiv Detail & Related papers (2024-11-04T22:01:52Z) - Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation.
We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation.
We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z) - Learning Context-aware Classifier for Semantic Segmentation [88.88198210948426]
In this paper, contextual hints are exploited via learning a context-aware classifier.
Our method is model-agnostic and can be easily applied to generic segmentation models.
With only negligible additional parameters and +2% inference time, decent performance gain has been achieved on both small and large models.
arXiv Detail & Related papers (2023-03-21T07:00:35Z) - Explaining Image Classifiers Using Contrastive Counterfactuals in
Generative Latent Spaces [12.514483749037998]
We introduce a novel method to generate causal and yet interpretable counterfactual explanations for image classifiers.
We use this framework to obtain contrastive and causal sufficiency and necessity scores as global explanations for black-box classifiers.
arXiv Detail & Related papers (2022-06-10T17:54:46Z) - Classifiers are Better Experts for Controllable Text Generation [63.17266060165098]
We show that the proposed method significantly outperforms recent PPLM, GeDi, and DExperts on PPL and sentiment accuracy based on the external classifier of generated texts.
The same time, it is also easier to implement and tune, and has significantly fewer restrictions and requirements.
arXiv Detail & Related papers (2022-05-15T12:58:35Z) - Comparing Text Representations: A Theory-Driven Approach [2.893558866535708]
We adapt general tools from computational learning theory to fit the specific characteristics of text datasets.
We present a method to evaluate the compatibility between representations and tasks.
This method provides a calibrated, quantitative measure of the difficulty of a classification-based NLP task.
arXiv Detail & Related papers (2021-09-15T17:48:19Z) - Experiments with adversarial attacks on text genres [0.0]
Neural models based on pre-trained transformers, such as BERT or XLM-RoBERTa, demonstrate SOTA results in many NLP tasks.
We show that embedding-based algorithms which can replace some of the most significant'' words with words similar to them, have the ability to influence model predictions in a significant proportion of cases.
arXiv Detail & Related papers (2021-07-05T19:37:59Z) - Evaluating Factuality in Generation with Dependency-level Entailment [57.5316011554622]
We propose a new formulation of entailment that decomposes it at the level of dependency arcs.
We show that our dependency arc entailment model trained on this data can identify factual inconsistencies in paraphrasing and summarization better than sentence-level methods.
arXiv Detail & Related papers (2020-10-12T06:43:10Z) - Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models.
This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z) - Interpretable Entity Representations through Large-Scale Typing [61.4277527871572]
We present an approach to creating entity representations that are human readable and achieve high performance out of the box.
Our representations are vectors whose values correspond to posterior probabilities over fine-grained entity types.
We show that it is possible to reduce the size of our type set in a learning-based way for particular domains.
arXiv Detail & Related papers (2020-04-30T23:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.