Related papers: Constructing Holistic Measures for Social Biases in Masked Language Models

Constructing Holistic Measures for Social Biases in Masked Language Models

URL: http://arxiv.org/abs/2305.07795v2
Date: Fri, 1 Sep 2023 13:44:14 GMT
Title: Constructing Holistic Measures for Social Biases in Masked Language Models
Authors: Yang Liu and Yuexian Hou
Abstract summary: Masked Language Models (MLMs) have been successful in many natural language processing tasks. Real-world stereotype biases are likely to be reflected ins due to their learning from large text corpora. Two evaluation metrics, Kullback Leiblergence Score (KLDivS) and Jensen Shannon Divergence Score (JSDivS) are proposed to evaluate social biases ins.
Score: 17.45153670825904
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Masked Language Models (MLMs) have been successful in many natural language processing tasks. However, real-world stereotype biases are likely to be reflected in MLMs due to their learning from large text corpora. Most of the evaluation metrics proposed in the past adopt different masking strategies, designed with the log-likelihood of MLMs. They lack holistic considerations such as variance for stereotype bias and anti-stereotype bias samples. In this paper, the log-likelihoods of stereotype bias and anti-stereotype bias samples output by MLMs are considered Gaussian distributions. Two evaluation metrics, Kullback Leibler Divergence Score (KLDivS) and Jensen Shannon Divergence Score (JSDivS) are proposed to evaluate social biases in MLMs The experimental results on the public datasets StereoSet and CrowS-Pairs demonstrate that KLDivS and JSDivS are more stable and interpretable compared to the metrics proposed in the past.

Related papers

Investigating and Mitigating Stereotype-aware Unfairness in LLM-based Recommendations [18.862841015556995]
Large Language Models (LLMs) have demonstrated unprecedented language understanding and reasoning capabilities. Recent studies have revealed that LLMs are likely to inherit stereotypes that are embedded ubiquitously in word embeddings. This study reveals a new variant of fairness between stereotype groups containing both users and items, to quantify discrimination against stereotypes in LLM-RS.
arXiv Detail & Related papers (2025-04-05T15:09:39Z)
No LLM is Free From Bias: A Comprehensive Study of Bias Evaluation in Large Language models [0.9620910657090186]
Large Language Models (LLMs) have increased the performance of different natural language understanding as well as generation tasks. Although LLMs have breached the state-of-the-art performance in various tasks, they often reflect different forms of bias present in the training data. We provide a unified evaluation of benchmarks using a set of representative LLMs that cover different forms of biases starting from physical characteristics to socio-economic categories.
arXiv Detail & Related papers (2025-03-15T03:58:14Z)
A Novel Interpretability Metric for Explaining Bias in Language Models: Applications on Multilingual Models from Southeast Asia [0.3376269351435396]
We propose a novel metric to measure token-level contributions to biased behavior in pretrained language models (PLMs) Our results confirm the presence of sexist and homophobic bias in Southeast Asian PLMs. Interpretability and semantic analyses also reveal that PLM bias is strongly induced by words relating to crime, intimate relationships, and helping.
arXiv Detail & Related papers (2024-10-20T18:31:05Z)
Social Debiasing for Fair Multi-modal LLMs [55.8071045346024]
Multi-modal Large Language Models (MLLMs) have advanced significantly, offering powerful vision-language understanding capabilities. However, these models often inherit severe social biases from their training datasets, leading to unfair predictions based on attributes like race and gender. This paper addresses the issue of social biases in MLLMs by i) Introducing a comprehensive Counterfactual dataset with Multiple Social Concepts (CMSC) and ii) Proposing an Anti-Stereotype Debiasing strategy (ASD)
arXiv Detail & Related papers (2024-08-13T02:08:32Z)
CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models [58.57987316300529]
Large Language Models (LLMs) are increasingly deployed to handle various natural language processing (NLP) tasks. To evaluate the biases exhibited by LLMs, researchers have recently proposed a variety of datasets. We propose CEB, a Compositional Evaluation Benchmark that covers different types of bias across different social groups and tasks.
arXiv Detail & Related papers (2024-07-02T16:31:37Z)
Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of large language models' implicit bias towards certain demographics. Inspired by psychometric principles, we propose three attack approaches, i.e., Disguise, Deception, and Teaching. Our methods can elicit LLMs' inner bias more effectively than competitive baselines.
arXiv Detail & Related papers (2024-06-20T06:42:08Z)
Measuring Social Biases in Masked Language Models by Proxy of Prediction Quality [0.0]
Social political scientists often aim to discover and measure distinct biases from text data representations (embeddings) In this paper, we evaluate the social biases encoded by transformers trained with a masked language modeling objective. We find that proposed measures produce more accurate estimations of relative preference for biased sentences between transformers than others based on our methods.
arXiv Detail & Related papers (2024-02-21T17:33:13Z)
Exploring Value Biases: How LLMs Deviate Towards the Ideal [57.99044181599786]
Large-Language-Models (LLMs) are deployed in a wide range of applications, and their response has an increasing social impact. We show that value bias is strong in LLMs across different categories, similar to the results found in human studies.
arXiv Detail & Related papers (2024-02-16T18:28:43Z)
Robust Evaluation Measures for Evaluating Social Biases in Masked Language Models [6.697298321551588]
We construct evaluation measures for the distributions of stereotypical and anti-stereotypical scores. Our proposed measures are significantly more robust and interpretable than those proposed previously.
arXiv Detail & Related papers (2024-01-21T21:21:51Z)
GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community. The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability. We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z)
BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation [89.41378346080603]
This work presents the first systematic study on the social bias in PLM-based metrics. We demonstrate that popular PLM-based metrics exhibit significantly higher social bias than traditional metrics on 6 sensitive attributes. In addition, we develop debiasing adapters that are injected into PLM layers, mitigating bias in PLM-based metrics while retaining high performance for evaluating text generation.
arXiv Detail & Related papers (2022-10-14T08:24:11Z)
Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases. We propose steps towards mitigating social biases during text generation. Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)
Unmasking the Mask -- Evaluating Social Biases in Masked Language Models [28.378270372391498]
Masked Language Models (MLMs) have superior performances in numerous downstream NLP tasks when used as text encoders. We propose All Unmasked Likelihood (AUL), a bias evaluation measure that predicts all tokens in a test case. We also propose AUL with attention weights (AULA) to evaluate tokens based on their importance in a sentence.
arXiv Detail & Related papers (2021-04-15T14:40:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.