Identifying and Measuring Token-Level Sentiment Bias in Pre-trained
Language Models with Prompts
- URL: http://arxiv.org/abs/2204.07289v1
- Date: Fri, 15 Apr 2022 02:01:31 GMT
- Title: Identifying and Measuring Token-Level Sentiment Bias in Pre-trained
Language Models with Prompts
- Authors: Apoorv Garg, Deval Srivastava, Zhiyang Xu, Lifu Huang
- Abstract summary: Large-scale pre-trained language models (PLMs) have been widely adopted in many aspects of human society.
Recent advances in prompt tuning show the possibility to explore the internal mechanism of the PLMs.
We propose two token-level sentiment tests: Sentiment Association Test (SAT) and Sentiment Shift Test (SST)
- Score: 7.510757198308537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the superior performance, large-scale pre-trained language models
(PLMs) have been widely adopted in many aspects of human society. However, we
still lack effective tools to understand the potential bias embedded in the
black-box models. Recent advances in prompt tuning show the possibility to
explore the internal mechanism of the PLMs. In this work, we propose two
token-level sentiment tests: Sentiment Association Test (SAT) and Sentiment
Shift Test (SST) which utilize the prompt as a probe to detect the latent bias
in the PLMs. Our experiments on the collection of sentiment datasets show that
both SAT and SST can identify sentiment bias in PLMs and SST is able to
quantify the bias. The results also suggest that fine-tuning can possibly
augment the existing bias in PLMs.
Related papers
- A Novel Interpretability Metric for Explaining Bias in Language Models: Applications on Multilingual Models from Southeast Asia [0.3376269351435396]
We propose a novel metric to measure token-level contributions to biased behavior in pretrained language models (PLMs)
Our results confirm the presence of sexist and homophobic bias in Southeast Asian PLMs.
Interpretability and semantic analyses also reveal that PLM bias is strongly induced by words relating to crime, intimate relationships, and helping.
arXiv Detail & Related papers (2024-10-20T18:31:05Z) - Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory [29.201402717025335]
Large language models (LLMs) are trained on extensive text corpora, which inevitably include biased information.
We have formally defined the implicit bias problem and developed an innovative framework for bias removal based on Bayesian theory.
arXiv Detail & Related papers (2024-08-20T07:40:12Z) - Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation [0.0]
Large Language Models (LLMs) have revolutionized artificial intelligence, demonstrating remarkable computational power and linguistic capabilities.
These models are inherently prone to various biases stemming from their training data.
This study explores the presence of these biases within the responses given by the most recent LLMs, analyzing the impact on their fairness and reliability.
arXiv Detail & Related papers (2024-07-11T12:30:19Z) - The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction [56.17020601803071]
Recent research shows that pre-trained language models (PLMs) suffer from "prompt bias" in factual knowledge extraction.
This paper aims to improve the reliability of existing benchmarks by thoroughly investigating and mitigating prompt bias.
arXiv Detail & Related papers (2024-03-15T02:04:35Z) - Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads [17.455607526521295]
We propose a bias analysis framework to explore and identify a small set of biased heads that are found to contribute to a PLM's stereotypical bias.
We investigate gender and racial bias in the English language in two types of Transformer-based PLMs: the encoder-based BERT model and the decoder-based autoregressive GPT model.
arXiv Detail & Related papers (2023-11-17T08:56:13Z) - Making Pre-trained Language Models both Task-solvers and
Self-calibrators [52.98858650625623]
Pre-trained language models (PLMs) serve as backbones for various real-world systems.
Previous work shows that introducing an extra calibration task can mitigate this issue.
We propose a training algorithm LM-TOAST to tackle the challenges.
arXiv Detail & Related papers (2023-07-21T02:51:41Z) - BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models [73.29106813131818]
bias testing is currently cumbersome since the test sentences are generated from a limited set of manual templates or need expensive crowd-sourcing.
We propose using ChatGPT for the controllable generation of test sentences, given any arbitrary user-specified combination of social groups and attributes.
We present an open-source comprehensive bias testing framework (BiasTestGPT), hosted on HuggingFace, that can be plugged into any open-source PLM for bias testing.
arXiv Detail & Related papers (2023-02-14T22:07:57Z) - ADEPT: A DEbiasing PrompT Framework [49.582497203415855]
Finetuning is an applicable approach for debiasing contextualized word embeddings.
discrete prompts with semantic meanings have shown to be effective in debiasing tasks.
We propose ADEPT, a method to debias PLMs using prompt tuning while maintaining the delicate balance between removing biases and ensuring representation ability.
arXiv Detail & Related papers (2022-11-10T08:41:40Z) - Prompt Tuning for Discriminative Pre-trained Language Models [96.04765512463415]
Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks.
It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned.
We present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem.
arXiv Detail & Related papers (2022-05-23T10:11:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.