Anti-stereotypical Predictive Text Suggestions Do Not Reliably Yield Anti-stereotypical Writing
- URL: http://arxiv.org/abs/2409.20390v1
- Date: Mon, 30 Sep 2024 15:21:25 GMT
- Title: Anti-stereotypical Predictive Text Suggestions Do Not Reliably Yield Anti-stereotypical Writing
- Authors: Connor Baumler, Hal Daumé III,
- Abstract summary: We consider how "debiasing" a language model impacts stories that people write using that language model in a predictive text scenario.
We find that, in certain scenarios, language model suggestions that align with common social stereotypes are more likely to be accepted by human authors.
- Score: 28.26615961488287
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI-based systems such as language models can replicate and amplify social biases reflected in their training data. Among other questionable behavior, this can lead to LM-generated text--and text suggestions--that contain normatively inappropriate stereotypical associations. In this paper, we consider the question of how "debiasing" a language model impacts stories that people write using that language model in a predictive text scenario. We find that (n=414), in certain scenarios, language model suggestions that align with common social stereotypes are more likely to be accepted by human authors. Conversely, although anti-stereotypical language model suggestions sometimes lead to an increased rate of anti-stereotypical stories, this influence is far from sufficient to lead to "fully debiased" stories.
Related papers
- Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - Towards Auditing Large Language Models: Improving Text-based Stereotype
Detection [5.3634450268516565]
This work introduces i) the Multi-Grain Stereotype dataset, which includes 52,751 instances of gender, race, profession and religion stereotypic text.
We design several experiments to rigorously test the proposed model trained on the novel dataset.
Experiments show that training the model in a multi-class setting can outperform the one-vs-all binary counterpart.
arXiv Detail & Related papers (2023-11-23T17:47:14Z) - Exposing Bias in Online Communities through Large-Scale Language Models [3.04585143845864]
This work uses the flaw of bias in language models to explore the biases of six different online communities.
The bias of the resulting models is evaluated by prompting the models with different demographics and comparing the sentiment and toxicity values of these generations.
This work not only affirms how easily bias is absorbed from training data but also presents a scalable method to identify and compare the bias of different datasets or communities.
arXiv Detail & Related papers (2023-06-04T08:09:26Z) - Logic Against Bias: Textual Entailment Mitigates Stereotypical Sentence
Reasoning [8.990338162517086]
We describe several kinds of stereotypes concerning different communities that are present in popular sentence representation models.
By comparing strong pretrained models based on text similarity with textual entailment learning, we conclude that the explicit logic learning with textual entailment can significantly reduce bias.
arXiv Detail & Related papers (2023-03-10T02:52:13Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Easily Accessible Text-to-Image Generation Amplifies Demographic
Stereotypes at Large Scale [61.555788332182395]
We investigate the potential for machine learning models to amplify dangerous and complex stereotypes.
We find a broad range of ordinary prompts produce stereotypes, including prompts simply mentioning traits, descriptors, occupations, or objects.
arXiv Detail & Related papers (2022-11-07T18:31:07Z) - Estimating the Personality of White-Box Language Models [0.589889361990138]
Large-scale language models, which are trained on large corpora of text, are being used in a wide range of applications everywhere.
Existing research shows that these models can and do capture human biases.
Many of these biases, especially those that could potentially cause harm, are being well-investigated.
However, studies that infer and change human personality traits inherited by these models have been scarce or non-existent.
arXiv Detail & Related papers (2022-04-25T23:53:53Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.
We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z) - Ethical-Advice Taker: Do Language Models Understand Natural Language
Interventions? [62.74872383104381]
We investigate the effectiveness of natural language interventions for reading-comprehension systems.
We propose a new language understanding task, Linguistic Ethical Interventions (LEI), where the goal is to amend a question-answering (QA) model's unethical behavior.
arXiv Detail & Related papers (2021-06-02T20:57:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.