HERB: Measuring Hierarchical Regional Bias in Pre-trained Language
Models
- URL: http://arxiv.org/abs/2211.02882v1
- Date: Sat, 5 Nov 2022 11:30:57 GMT
- Title: HERB: Measuring Hierarchical Regional Bias in Pre-trained Language
Models
- Authors: Yizhi Li, Ge Zhang, Bohao Yang, Chenghua Lin, Shi Wang, Anton Ragni,
Jie Fu
- Abstract summary: Regional bias in language models (LMs) is a long-standing global discrimination problem.
This paper bridges the gap by analysing the regional bias learned by the pre-trained language models.
We propose a HiErarchical Regional Bias evaluation method (HERB) to quantify the bias in pre-trained LMs.
- Score: 33.0987914452712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fairness has become a trending topic in natural language processing (NLP),
which addresses biases targeting certain social groups such as genders and
religions. However, regional bias in language models (LMs), a long-standing
global discrimination problem, still remains unexplored. This paper bridges the
gap by analysing the regional bias learned by the pre-trained language models
that are broadly used in NLP tasks. In addition to verifying the existence of
regional bias in LMs, we find that the biases on regional groups can be
strongly influenced by the geographical clustering of the groups. We
accordingly propose a HiErarchical Regional Bias evaluation method (HERB)
utilising the information from the sub-region clusters to quantify the bias in
pre-trained LMs. Experiments show that our hierarchical metric can effectively
evaluate the regional bias with respect to comprehensive topics and measure the
potential regional bias that can be propagated to downstream tasks. Our codes
are available at https://github.com/Bernard-Yang/HERB.
Related papers
- Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory [29.201402717025335]
Large language models (LLMs) are trained on extensive text corpora, which inevitably include biased information.
We have formally defined the implicit bias problem and developed an innovative framework for bias removal based on Bayesian theory.
arXiv Detail & Related papers (2024-08-20T07:40:12Z) - REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning [18.064064773660174]
We introduce REFINE-LM, a debiasing method that uses reinforcement learning to handle different types of biases without any fine-tuning.
By training a simple model on top of the word probability distribution of a LM, our bias reinforcement learning method enables model debiasing without human annotations.
Experiments conducted on a wide range of models, including several LMs, show that our method significantly reduces stereotypical biases while preserving LMs performance.
arXiv Detail & Related papers (2024-08-18T14:08:31Z) - Editable Fairness: Fine-Grained Bias Mitigation in Language Models [52.66450426729818]
We propose a novel debiasing approach, Fairness Stamp (FAST), which enables fine-grained calibration of individual social biases.
FAST surpasses state-of-the-art baselines with superior debiasing performance.
This highlights the potential of fine-grained debiasing strategies to achieve fairness in large language models.
arXiv Detail & Related papers (2024-08-07T17:14:58Z) - Towards Region-aware Bias Evaluation Metrics [26.91545185271231]
We identify topical differences in gender bias across different regions and propose a region-aware bottom-up approach for bias assessment.
Our proposed approach uses gender-aligned topics for a given region and identifies gender bias dimensions in the form of topic pairs.
Several of our proposed bias topic pairs are on par with human perception of gender biases in these regions in comparison to the existing ones.
arXiv Detail & Related papers (2024-06-23T16:26:27Z) - GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community.
The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability.
We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z) - ROBBIE: Robust Bias Evaluation of Large Generative Language Models [27.864027322486375]
Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes.
We compare 6 different prompt-based bias and toxicity metrics across 12 demographic axes and 5 families of generative LLMs.
We conduct a comprehensive study of how well 3 bias/toxicity mitigation techniques perform across our suite of measurements.
arXiv Detail & Related papers (2023-11-29T23:03:04Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z) - LOGAN: Local Group Bias Detection by Clustering [86.38331353310114]
We argue that evaluating bias at the corpus level is not enough for understanding how biases are embedded in a model.
We propose LOGAN, a new bias detection technique based on clustering.
Experiments on toxicity classification and object classification tasks show that LOGAN identifies bias in a local region.
arXiv Detail & Related papers (2020-10-06T16:42:51Z) - Towards Controllable Biases in Language Generation [87.89632038677912]
We develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups.
We analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics.
arXiv Detail & Related papers (2020-05-01T08:25:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.