Related papers: CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models

URL: http://arxiv.org/abs/2306.16244v1
Date: Wed, 28 Jun 2023 14:14:44 GMT
Title: CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models
Authors: Yufei Huang and Deyi Xiong
Abstract summary: We present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories.
Score: 52.25049362267279
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Holistically measuring societal biases of large language models is crucial for detecting and reducing ethical risks in highly capable AI models. In this work, we present a Chinese Bias Benchmark dataset that consists of over 100K questions jointly constructed by human experts and generative language models, covering stereotypes and societal biases in 14 social dimensions related to Chinese culture and values. The curation process contains 4 essential steps: bias identification via extensive literature review, ambiguous context generation, AI-assisted disambiguous context generation, snd manual review \& recomposition. The testing instances in the dataset are automatically derived from 3K+ high-quality templates manually authored with stringent quality control. The dataset exhibits wide coverage and high diversity. Extensive experiments demonstrate the effectiveness of the dataset in detecting model bias, with all 10 publicly available Chinese large language models exhibiting strong bias in certain categories. Additionally, we observe from our experiments that fine-tuned models could, to a certain extent, heed instructions and avoid generating outputs that are morally harmful in some types, in the way of "moral self-correction". Our dataset and results are publicly available at \href{https://github.com/YFHuangxxxx/CBBQ}{https://github.com/YFHuangxxxx/CBBQ}, offering debiasing research opportunities to a widened community.

Related papers

GUS-Net: Social Bias Classification in Text with Generalizations, Unfairness, and Stereotypes [2.2162879952427343]
This paper introduces GUS-Net, an innovative approach to bias detection. GUS-Net focuses on three key types of biases: (G)eneralizations, (U)nfairness, and (S)tereotypes. Our methodology enhances traditional bias detection methods by incorporating the contextual encodings of pre-trained models.
arXiv Detail & Related papers (2024-10-10T21:51:22Z)
VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model [72.13121434085116]
VLBiasBench is a benchmark aimed at evaluating biases in Large Vision-Language Models (LVLMs) We construct a dataset encompassing nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status and two intersectional bias categories (race x gender, and race x social economic status) We conduct extensive evaluations on 15 open-source models as well as one advanced closed-source model, providing some new insights into the biases revealing from these models.
arXiv Detail & Related papers (2024-06-20T10:56:59Z)
GECOBench: A Gender-Controlled Text Dataset and Benchmark for Quantifying Biases in Explanations [1.0000511213628438]
We create a gender-controlled text dataset, GECO, in which otherwise identical sentences appear in male and female forms. This gives rise to ground-truth 'world explanations' for gender classification tasks. We also provide GECOBench, a rigorous quantitative evaluation framework benchmarking popular XAI methods.
arXiv Detail & Related papers (2024-06-17T13:44:37Z)
Towards Auditing Large Language Models: Improving Text-based Stereotype Detection [5.3634450268516565]
This work introduces i) the Multi-Grain Stereotype dataset, which includes 52,751 instances of gender, race, profession and religion stereotypic text. We design several experiments to rigorously test the proposed model trained on the novel dataset. Experiments show that training the model in a multi-class setting can outperform the one-vs-all binary counterpart.
arXiv Detail & Related papers (2023-11-23T17:47:14Z)
Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase. We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output. Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z)
Exposing Bias in Online Communities through Large-Scale Language Models [3.04585143845864]
This work uses the flaw of bias in language models to explore the biases of six different online communities. The bias of the resulting models is evaluated by prompting the models with different demographics and comparing the sentiment and toxicity values of these generations. This work not only affirms how easily bias is absorbed from training data but also presents a scalable method to identify and compare the bias of different datasets or communities.
arXiv Detail & Related papers (2023-06-04T08:09:26Z)
CHBias: Bias Evaluation and Mitigation of Chinese Conversational Language Models [30.400023506841503]
We introduce a new Chinese dataset, CHBias, for bias evaluation and mitigation of Chinese conversational language models. We evaluate two popular pretrained Chinese conversational models, CDial-GPT and EVA2.0, using CHBias.
arXiv Detail & Related papers (2023-05-18T18:58:30Z)
Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z)
Challenges in Measuring Bias via Open-Ended Language Generation [1.5552869983952944]
We analyze how specific choices of prompt sets, metrics, automatic tools and sampling strategies affect bias results. We provide recommendations for reporting biases in open-ended language generation.
arXiv Detail & Related papers (2022-05-23T19:57:15Z)
"I'm sorry to hear that": Finding New Biases in Language Models with a Holistic Descriptor Dataset [12.000335510088648]
We present a new, more inclusive bias measurement dataset, HolisticBias, which includes nearly 600 descriptor terms across 13 different demographic axes. HolisticBias was assembled in a participatory process including experts and community members with lived experience of these terms. We demonstrate that HolisticBias is effective at measuring previously undetectable biases in token likelihoods from language models.
arXiv Detail & Related papers (2022-05-18T20:37:25Z)
Automatically Identifying Semantic Bias in Crowdsourced Natural Language Inference Datasets [78.6856732729301]
We introduce a model-driven, unsupervised technique to find "bias clusters" in a learned embedding space of hypotheses in NLI datasets. interventions and additional rounds of labeling can be performed to ameliorate the semantic bias of the hypothesis distribution of a dataset.
arXiv Detail & Related papers (2021-12-16T22:49:01Z)
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases [55.45617404586874]
We propose a few-shot instruction-based method for prompting pre-trained language models (LMs) We show that large LMs can detect different types of fine-grained biases with similar and sometimes superior accuracy to fine-tuned models.
arXiv Detail & Related papers (2021-12-15T04:19:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.