Towards Controllable Biases in Language Generation
- URL: http://arxiv.org/abs/2005.00268v2
- Date: Wed, 7 Oct 2020 05:17:16 GMT
- Title: Towards Controllable Biases in Language Generation
- Authors: Emily Sheng, Kai-Wei Chang, Premkumar Natarajan, Nanyun Peng
- Abstract summary: We develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups.
We analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics.
- Score: 87.89632038677912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a general approach towards controllable societal biases in natural
language generation (NLG). Building upon the idea of adversarial triggers, we
develop a method to induce societal biases in generated text when input prompts
contain mentions of specific demographic groups. We then analyze two scenarios:
1) inducing negative biases for one demographic and positive biases for another
demographic, and 2) equalizing biases between demographics. The former scenario
enables us to detect the types of biases present in the model. Specifically, we
show the effectiveness of our approach at facilitating bias analysis by finding
topics that correspond to demographic inequalities in generated text and
comparing the relative effectiveness of inducing biases for different
demographics. The second scenario is useful for mitigating biases in downstream
applications such as dialogue generation. In our experiments, the mitigation
technique proves to be effective at equalizing the amount of biases across
demographics while simultaneously generating less negatively biased text
overall.
Related papers
- Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information [50.29934517930506]
DAFair is a novel approach to address social bias in language models.
We leverage prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias.
arXiv Detail & Related papers (2024-03-14T15:58:36Z) - Exposing Bias in Online Communities through Large-Scale Language Models [3.04585143845864]
This work uses the flaw of bias in language models to explore the biases of six different online communities.
The bias of the resulting models is evaluated by prompting the models with different demographics and comparing the sentiment and toxicity values of these generations.
This work not only affirms how easily bias is absorbed from training data but also presents a scalable method to identify and compare the bias of different datasets or communities.
arXiv Detail & Related papers (2023-06-04T08:09:26Z) - COFFEE: Counterfactual Fairness for Personalized Text Generation in
Explainable Recommendation [56.520470678876656]
bias inherent in user written text can associate different levels of linguistic quality with users' protected attributes.
We introduce a general framework to achieve measure-specific counterfactual fairness in explanation generation.
arXiv Detail & Related papers (2022-10-14T02:29:10Z) - Challenges in Measuring Bias via Open-Ended Language Generation [1.5552869983952944]
We analyze how specific choices of prompt sets, metrics, automatic tools and sampling strategies affect bias results.
We provide recommendations for reporting biases in open-ended language generation.
arXiv Detail & Related papers (2022-05-23T19:57:15Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Mitigating Biases in Toxic Language Detection through Invariant
Rationalization [70.36701068616367]
biases toward some attributes, including gender, race, and dialect, exist in most training datasets for toxicity detection.
We propose to use invariant rationalization (InvRat), a game-theoretic framework consisting of a rationale generator and a predictor, to rule out the spurious correlation of certain syntactic patterns.
Our method yields lower false positive rate in both lexical and dialectal attributes than previous debiasing methods.
arXiv Detail & Related papers (2021-06-14T08:49:52Z) - The Authors Matter: Understanding and Mitigating Implicit Bias in Deep
Text Classification [36.361778457307636]
Deep text classification models can produce biased outcomes for texts written by authors of certain demographic groups.
In this paper, we first demonstrate that implicit bias exists in different text classification tasks for different demographic groups.
We then build a learning-based interpretation method to deepen our knowledge of implicit bias.
arXiv Detail & Related papers (2021-05-06T16:17:38Z) - Detecting Emergent Intersectional Biases: Contextualized Word Embeddings
Contain a Distribution of Human-like Biases [10.713568409205077]
State-of-the-art neural language models generate dynamic word embeddings dependent on the context in which the word appears.
We introduce the Contextualized Embedding Association Test (CEAT), that can summarize the magnitude of overall bias in neural language models.
We develop two methods, Intersectional Bias Detection (IBD) and Emergent Intersectional Bias Detection (EIBD), to automatically identify the intersectional biases and emergent intersectional biases from static word embeddings.
arXiv Detail & Related papers (2020-06-06T19:49:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.