KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large
Language Model Application
- URL: http://arxiv.org/abs/2305.17701v2
- Date: Tue, 30 May 2023 01:42:07 GMT
- Title: KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large
Language Model Application
- Authors: Hwaran Lee, Seokhee Hong, Joonsuk Park, Takyoung Kim, Gunhee Kim and
Jung-Woo Ha
- Abstract summary: Large language models (LLMs) learn not only natural text generation abilities but also social biases against different demographic groups from real-world data.
Existing research and resources are not readily applicable in South Korea due to the differences in language and culture.
We present KO SB I, a new social bias dataset of 34k pairs of contexts and sentences in Korean covering 72 demographic groups in 15 categories.
- Score: 45.3863281375947
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) learn not only natural text generation abilities
but also social biases against different demographic groups from real-world
data. This poses a critical risk when deploying LLM-based applications.
Existing research and resources are not readily applicable in South Korea due
to the differences in language and culture, both of which significantly affect
the biases and targeted demographic groups. This limitation requires localized
social bias datasets to ensure the safe and effective deployment of LLMs. To
this end, we present KO SB I, a new social bias dataset of 34k pairs of
contexts and sentences in Korean covering 72 demographic groups in 15
categories. We find that through filtering-based moderation, social biases in
generated content can be reduced by 16.47%p on average for HyperCLOVA (30B and
82B), and GPT-3.
Related papers
- BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization [0.0]
Large Language Models (LLMs) have become pivotal in advancing natural language processing, yet their potential to perpetuate biases poses significant concerns.
This paper introduces a new framework employing Direct Preference Optimization (DPO) to mitigate gender, racial, and religious biases in English text.
By developing a loss function that favors less biased over biased completions, our approach cultivates a preference for respectful and non-discriminatory language.
arXiv Detail & Related papers (2024-07-18T22:32:20Z) - The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - Analyzing Social Biases in Japanese Large Language Models [24.351580958043595]
We construct the Japanese Bias Benchmark dataset for Question Answering (JBBQ) based on the English bias benchmark BBQ.
We analyze social biases in Japanese Large Language Models (LLMs)
prompts with warnings about social biases and Chain-of-Thought prompting reduce the effect of biases in model outputs.
arXiv Detail & Related papers (2024-06-04T07:31:06Z) - Detecting Bias in Large Language Models: Fine-tuned KcBERT [0.0]
We define such harm as societal bias and assess ethnic, gender, and racial biases in a model fine-tuned with Korean comments.
Our contribution lies in demonstrating that societal bias exists in Korean language models due to language-dependent characteristics.
arXiv Detail & Related papers (2024-03-16T02:27:19Z) - GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community.
The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability.
We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z) - ROBBIE: Robust Bias Evaluation of Large Generative Language Models [27.864027322486375]
Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes.
We compare 6 different prompt-based bias and toxicity metrics across 12 demographic axes and 5 families of generative LLMs.
We conduct a comprehensive study of how well 3 bias/toxicity mitigation techniques perform across our suite of measurements.
arXiv Detail & Related papers (2023-11-29T23:03:04Z) - Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models [0.0]
This paper investigates bias along less-studied but still consequential, dimensions, such as age and beauty.
We ask whether LLMs hold wide-reaching biases of positive or negative sentiment for specific social groups similar to the "what is beautiful is good" bias found in people in experimental psychology.
arXiv Detail & Related papers (2023-09-16T07:07:04Z) - CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI
Collaboration for Large Language Models [52.25049362267279]
We present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models.
The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control.
Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories.
arXiv Detail & Related papers (2023-06-28T14:14:44Z) - Holistic Evaluation of Language Models [183.94891340168175]
Language models (LMs) are becoming the foundation for almost all major language technologies, but their capabilities, limitations, and risks are not well understood.
We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models.
arXiv Detail & Related papers (2022-11-16T18:51:34Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.