SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense
Reasoning Models
- URL: http://arxiv.org/abs/2210.07269v1
- Date: Thu, 13 Oct 2022 18:04:48 GMT
- Title: SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense
Reasoning Models
- Authors: Haozhe An, Zongxia Li, Jieyu Zhao, Rachel Rudinger
- Abstract summary: We propose SODAPOP (SOcial bias Discovery from Answers about PeOPle) in social commonsense question-answering.
By using a social commonsense model to score the generated distractors, we are able to uncover the model's stereotypic associations between demographic groups and an open set of words.
We also test SODAPOP on debiased models and show the limitations of multiple state-of-the-art debiasing algorithms.
- Score: 22.13138599547492
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A common limitation of diagnostic tests for detecting social biases in NLP
models is that they may only detect stereotypic associations that are
pre-specified by the designer of the test. Since enumerating all possible
problematic associations is infeasible, it is likely these tests fail to detect
biases that are present in a model but not pre-specified by the designer. To
address this limitation, we propose SODAPOP (SOcial bias Discovery from Answers
about PeOPle) in social commonsense question-answering. Our pipeline generates
modified instances from the Social IQa dataset (Sap et al., 2019) by (1)
substituting names associated with different demographic groups, and (2)
generating many distractor answers from a masked language model. By using a
social commonsense model to score the generated distractors, we are able to
uncover the model's stereotypic associations between demographic groups and an
open set of words. We also test SODAPOP on debiased models and show the
limitations of multiple state-of-the-art debiasing algorithms.
Related papers
- BiasDora: Exploring Hidden Biased Associations in Vision-Language Models [23.329280888159744]
We investigate hidden, implicit associations across 9 bias dimensions.
We show how biased associations vary in their negativity, toxicity, and extremity.
Our work identifies subtle and extreme biases that are typically not recognized by existing methodologies.
arXiv Detail & Related papers (2024-07-02T08:55:40Z) - VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model [72.13121434085116]
VLBiasBench is a benchmark aimed at evaluating biases in Large Vision-Language Models (LVLMs)
We construct a dataset encompassing nine distinct categories of social biases, including age, disability status, gender, nationality, physical appearance, race, religion, profession, social economic status and two intersectional bias categories (race x gender, and race x social economic status)
We conduct extensive evaluations on 15 open-source models as well as one advanced closed-source model, providing some new insights into the biases revealing from these models.
arXiv Detail & Related papers (2024-06-20T10:56:59Z) - The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - Quantifying Bias in Text-to-Image Generative Models [49.60774626839712]
Bias in text-to-image (T2I) models can propagate unfair social representations and may be used to aggressively market ideas or push controversial agendas.
Existing T2I model bias evaluation methods only focus on social biases.
We propose an evaluation methodology to quantify general biases in T2I generative models, without any preconceived notions.
arXiv Detail & Related papers (2023-12-20T14:26:54Z) - SocialStigmaQA: A Benchmark to Uncover Stigma Amplification in
Generative Language Models [8.211129045180636]
We introduce a benchmark meant to capture the amplification of social bias, via stigmas, in generative language models.
Our benchmark, SocialStigmaQA, contains roughly 10K prompts, with a variety of prompt styles, carefully constructed to test for both social bias and model robustness.
We find that the proportion of socially biased output ranges from 45% to 59% across a variety of decoding strategies and prompting styles.
arXiv Detail & Related papers (2023-12-12T18:27:44Z) - Social Bias Probing: Fairness Benchmarking for Language Models [38.180696489079985]
This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment.
We curate SoFa, a large-scale benchmark designed to address the limitations of existing fairness collections.
We show that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized.
arXiv Detail & Related papers (2023-11-15T16:35:59Z) - BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models [73.29106813131818]
bias testing is currently cumbersome since the test sentences are generated from a limited set of manual templates or need expensive crowd-sourcing.
We propose using ChatGPT for the controllable generation of test sentences, given any arbitrary user-specified combination of social groups and attributes.
We present an open-source comprehensive bias testing framework (BiasTestGPT), hosted on HuggingFace, that can be plugged into any open-source PLM for bias testing.
arXiv Detail & Related papers (2023-02-14T22:07:57Z) - "I'm sorry to hear that": Finding New Biases in Language Models with a
Holistic Descriptor Dataset [12.000335510088648]
We present a new, more inclusive bias measurement dataset, HolisticBias, which includes nearly 600 descriptor terms across 13 different demographic axes.
HolisticBias was assembled in a participatory process including experts and community members with lived experience of these terms.
We demonstrate that HolisticBias is effective at measuring previously undetectable biases in token likelihoods from language models.
arXiv Detail & Related papers (2022-05-18T20:37:25Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z) - The Gap on GAP: Tackling the Problem of Differing Data Distributions in
Bias-Measuring Datasets [58.53269361115974]
Diagnostic datasets that can detect biased models are an important prerequisite for bias reduction within natural language processing.
undesired patterns in the collected data can make such tests incorrect.
We introduce a theoretically grounded method for weighting test samples to cope with such patterns in the test data.
arXiv Detail & Related papers (2020-11-03T16:50:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.