ChatGPT as a Text Simplification Tool to Remove Bias
- URL: http://arxiv.org/abs/2305.06166v2
- Date: Thu, 1 Jun 2023 09:45:30 GMT
- Title: ChatGPT as a Text Simplification Tool to Remove Bias
- Authors: Charmaine Barker and Dimitar Kazakov
- Abstract summary: The presence of specific linguistic signals particular to a certain sub-group can be picked up by language models during training.
We explore a potential technique for bias mitigation in the form of simplification of text.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The presence of specific linguistic signals particular to a certain sub-group
of people can be picked up by language models during training. If the model
begins to associate specific language with a distinct group, any decisions made
based upon this language would hold a strong correlation to a decision based
upon their protected characteristic, leading to possible discrimination. We
explore a potential technique for bias mitigation in the form of simplification
of text. The driving force of this idea is that simplifying text should
standardise language between different sub-groups to one way of speaking while
keeping the same meaning. The experiment shows promising results as the
classifier accuracy for predicting the sensitive attribute drops by up to 17%
for the simplified data.
Related papers
- Explaining News Bias Detection: A Comparative SHAP Analysis of Transformer Model Decision Mechanisms [0.2538209532048867]
We present a comparative interpretability study of two bias detection models: a bias detector fine-tuned on the BABE dataset and a domain-adapted pre-trained RoBERTa model fine-tuned on the BABE dataset.<n>We analyze word-level attributions across correct and incorrect predictions to characterize how different model architectures operationalize linguistic bias.
arXiv Detail & Related papers (2025-12-29T19:58:11Z) - Target-oriented Multimodal Sentiment Classification with Counterfactual-enhanced Debiasing [5.0175188046562385]
multimodal sentiment classification seeks to predict sentiment polarity for specific targets from image-text pairs.<n>Existing works often over-rely on textual content and fail to consider dataset biases.<n>We introduce a novel counterfactual-enhanced debiasing framework to reduce such spurious correlations.
arXiv Detail & Related papers (2025-09-11T05:40:53Z) - Semantic and Structural Analysis of Implicit Biases in Large Language Models: An Interpretable Approach [1.5749416770494704]
It proposes an interpretable bias detection method aimed at identifying hidden social biases in model outputs.<n>The method combines nested semantic representation with a contextual contrast mechanism.<n>The evaluation focuses on several key metrics, such as bias detection accuracy, semantic consistency, and contextual sensitivity.
arXiv Detail & Related papers (2025-08-08T09:21:10Z) - ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability [62.285407189502216]
Detecting texts generated by Large Language Models (LLMs) could cause grave mistakes due to incorrect decisions.<n>We introduce ExaGPT, an interpretable detection approach grounded in the human decision-making process.<n>We show that ExaGPT massively outperforms prior powerful detectors by up to +40.9 points of accuracy at a false positive rate of 1%.
arXiv Detail & Related papers (2025-02-17T01:15:07Z) - Group-Adaptive Threshold Optimization for Robust AI-Generated Text Detection [60.09665704993751]
We introduce FairOPT, an algorithm for group-specific threshold optimization in AI-generated content classifiers.
Our approach partitions data into subgroups based on attributes (e.g., text length and writing style) and learns decision thresholds for each group.
Our framework paves the way for more robust and fair classification criteria in AI-generated output detection.
arXiv Detail & Related papers (2025-02-06T21:58:48Z) - Effective Black Box Testing of Sentiment Analysis Classification Networks [0.0]
Transformer-based neural networks have demonstrated remarkable performance in natural language processing tasks such as sentiment analysis.
This paper presents a collection of coverage criteria specifically designed to assess test suites created for transformer-based sentiment analysis networks.
arXiv Detail & Related papers (2024-07-30T14:58:11Z) - On the Efficacy of Sampling Adapters [82.5941326570812]
We propose a unified framework for understanding sampling adapters.
We argue that the shift they enforce can be viewed as a trade-off between precision and recall.
We find that several precision-emphasizing measures indeed indicate that sampling adapters can lead to probability distributions more aligned with the true distribution.
arXiv Detail & Related papers (2023-07-07T17:59:12Z) - CUE: An Uncertainty Interpretation Framework for Text Classifiers Built
on Pre-Trained Language Models [28.750894873827068]
We propose a novel framework, called CUE, which aims to interpret uncertainties inherent in the predictions of PLM-based models.
By comparing the difference in predictive uncertainty between the perturbed and the original text representations, we are able to identify the latent dimensions responsible for uncertainty.
arXiv Detail & Related papers (2023-06-06T11:37:46Z) - Reliable Detection and Quantification of Selective Forces in Language
Change [3.55026004901472]
We apply a recently-introduced method to corpus data to quantify the strength of selection in specific instances of historical language change.
We show that this method is more reliable and interpretable than similar methods that have previously been applied.
arXiv Detail & Related papers (2023-05-25T10:20:15Z) - Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats.
We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes.
We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z) - CCPrefix: Counterfactual Contrastive Prefix-Tuning for Many-Class
Classification [57.62886091828512]
We propose a brand-new prefix-tuning method, Counterfactual Contrastive Prefix-tuning (CCPrefix) for many-class classification.
Basically, an instance-dependent soft prefix, derived from fact-counterfactual pairs in the label space, is leveraged to complement the language verbalizers in many-class classification.
arXiv Detail & Related papers (2022-11-11T03:45:59Z) - Natural Language Inference Prompts for Zero-shot Emotion Classification
in Text across Corpora [11.986676998327864]
We show that the choice of a particular prompt formulation needs to fit to the corpus.
We show that this challenge can be tackled with combinations of multiple prompts.
arXiv Detail & Related papers (2022-09-14T15:07:36Z) - Challenges in Measuring Bias via Open-Ended Language Generation [1.5552869983952944]
We analyze how specific choices of prompt sets, metrics, automatic tools and sampling strategies affect bias results.
We provide recommendations for reporting biases in open-ended language generation.
arXiv Detail & Related papers (2022-05-23T19:57:15Z) - Polling Latent Opinions: A Method for Computational Sociolinguistics
Using Transformer Language Models [4.874780144224057]
We use the capacity for memorization and extrapolation of Transformer Language Models to learn the linguistic behaviors of a subgroup within larger corpora of Yelp reviews.
We show that even in cases where a specific keyphrase is limited or not present at all in the training corpora, the GPT is able to accurately generate large volumes of text that have the correct sentiment.
arXiv Detail & Related papers (2022-04-15T14:33:58Z) - Typical Decoding for Natural Language Generation [76.69397802617064]
We study why high-probability texts can be dull or repetitive.
We show that typical sampling offers competitive performance in terms of quality.
arXiv Detail & Related papers (2022-02-01T18:58:45Z) - Learning-based Hybrid Local Search for the Hard-label Textual Attack [53.92227690452377]
We consider a rarely investigated but more rigorous setting, namely hard-label attack, in which the attacker could only access the prediction label.
Based on this observation, we propose a novel hard-label attack, called Learning-based Hybrid Local Search (LHLS) algorithm.
Our LHLS significantly outperforms existing hard-label attacks regarding the attack performance as well as adversary quality.
arXiv Detail & Related papers (2022-01-20T14:16:07Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Mitigating Biases in Toxic Language Detection through Invariant
Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection.
We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns.
Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z) - Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.
We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z) - Double Perturbation: On the Robustness of Robustness and Counterfactual
Bias Evaluation [109.06060143938052]
We propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset.
We apply this framework to study two perturbation-based approaches that are used to analyze models' robustness and counterfactual bias in English.
arXiv Detail & Related papers (2021-04-12T06:57:36Z) - Contextualized Perturbation for Textual Adversarial Attack [56.370304308573274]
Adversarial examples expose the vulnerabilities of natural language processing (NLP) models.
This paper presents CLARE, a ContextuaLized AdversaRial Example generation model that produces fluent and grammatical outputs.
arXiv Detail & Related papers (2020-09-16T06:53:15Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.