BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models
- URL: http://arxiv.org/abs/2302.07371v3
- Date: Wed, 6 Dec 2023 06:26:52 GMT
- Title: BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models
- Authors: Rafal Kocielnik, Shrimai Prabhumoye, Vivian Zhang, Roy Jiang, R.
Michael Alvarez, Anima Anandkumar
- Abstract summary: bias testing is currently cumbersome since the test sentences are generated from a limited set of manual templates or need expensive crowd-sourcing.
We propose using ChatGPT for the controllable generation of test sentences, given any arbitrary user-specified combination of social groups and attributes.
We present an open-source comprehensive bias testing framework (BiasTestGPT), hosted on HuggingFace, that can be plugged into any open-source PLM for bias testing.
- Score: 73.29106813131818
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pretrained Language Models (PLMs) harbor inherent social biases that can
result in harmful real-world implications. Such social biases are measured
through the probability values that PLMs output for different social groups and
attributes appearing in a set of test sentences. However, bias testing is
currently cumbersome since the test sentences are generated either from a
limited set of manual templates or need expensive crowd-sourcing. We instead
propose using ChatGPT for the controllable generation of test sentences, given
any arbitrary user-specified combination of social groups and attributes
appearing in the test sentences. When compared to template-based methods, our
approach using ChatGPT for test sentence generation is superior in detecting
social bias, especially in challenging settings such as intersectional biases.
We present an open-source comprehensive bias testing framework (BiasTestGPT),
hosted on HuggingFace, that can be plugged into any open-source PLM for bias
testing. User testing with domain experts from various fields has shown their
interest in being able to test modern AI for social biases. Our tool has
significantly improved their awareness of such biases in PLMs, proving to be
learnable and user-friendly. We thus enable seamless open-ended social bias
testing of PLMs by domain experts through an automatic large-scale generation
of diverse test sentences for any combination of social categories and
attributes.
Related papers
- Unveiling Assumptions: Exploring the Decisions of AI Chatbots and Human Testers [2.5327705116230477]
Decision-making relies on a variety of information, including code, requirements specifications, and other software artifacts.
To fill in the gaps left by unclear information, we often rely on assumptions, intuition, or previous experiences to make decisions.
arXiv Detail & Related papers (2024-06-17T08:55:56Z) - Protecting Copyrighted Material with Unique Identifiers in Large Language Model Training [55.321010757641524]
A major public concern regarding the training of large language models (LLMs) is whether they abusing copyrighted online text.
Previous membership inference methods may be misled by similar examples in vast amounts of training data.
We propose an alternative textitinsert-and-detection methodology, advocating that web users and content platforms employ textbftextitunique identifiers.
arXiv Detail & Related papers (2024-03-23T06:36:32Z) - SocialStigmaQA: A Benchmark to Uncover Stigma Amplification in
Generative Language Models [8.211129045180636]
We introduce a benchmark meant to capture the amplification of social bias, via stigmas, in generative language models.
Our benchmark, SocialStigmaQA, contains roughly 10K prompts, with a variety of prompt styles, carefully constructed to test for both social bias and model robustness.
We find that the proportion of socially biased output ranges from 45% to 59% across a variety of decoding strategies and prompting styles.
arXiv Detail & Related papers (2023-12-12T18:27:44Z) - No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation [11.009117714870527]
Unit testing is essential in detecting bugs in functionally-discrete program units.
Recent work has shown the large potential of large language models (LLMs) in unit test generation.
It remains unclear how effective ChatGPT is in unit test generation.
arXiv Detail & Related papers (2023-05-07T07:17:08Z) - SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for
Generative Large Language Models [55.60306377044225]
"SelfCheckGPT" is a simple sampling-based approach to fact-check the responses of black-box models.
We investigate this approach by using GPT-3 to generate passages about individuals from the WikiBio dataset.
arXiv Detail & Related papers (2023-03-15T19:31:21Z) - Can ChatGPT Assess Human Personalities? A General Evaluation Framework [70.90142717649785]
Large Language Models (LLMs) have produced impressive results in various areas, but their potential human-like psychology is still largely unexplored.
This paper presents a generic evaluation framework for LLMs to assess human personalities based on Myers Briggs Type Indicator (MBTI) tests.
arXiv Detail & Related papers (2023-03-01T06:16:14Z) - The Tail Wagging the Dog: Dataset Construction Biases of Social Bias
Benchmarks [75.58692290694452]
We compare social biases with non-social biases stemming from choices made during dataset construction that might not even be discernible to the human eye.
We observe that these shallow modifications have a surprising effect on the resulting degree of bias across various models.
arXiv Detail & Related papers (2022-10-18T17:58:39Z) - COFFEE: Counterfactual Fairness for Personalized Text Generation in
Explainable Recommendation [56.520470678876656]
bias inherent in user written text can associate different levels of linguistic quality with users' protected attributes.
We introduce a general framework to achieve measure-specific counterfactual fairness in explanation generation.
arXiv Detail & Related papers (2022-10-14T02:29:10Z) - SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense
Reasoning Models [22.13138599547492]
We propose SODAPOP (SOcial bias Discovery from Answers about PeOPle) in social commonsense question-answering.
By using a social commonsense model to score the generated distractors, we are able to uncover the model's stereotypic associations between demographic groups and an open set of words.
We also test SODAPOP on debiased models and show the limitations of multiple state-of-the-art debiasing algorithms.
arXiv Detail & Related papers (2022-10-13T18:04:48Z) - Identifying and Measuring Token-Level Sentiment Bias in Pre-trained
Language Models with Prompts [7.510757198308537]
Large-scale pre-trained language models (PLMs) have been widely adopted in many aspects of human society.
Recent advances in prompt tuning show the possibility to explore the internal mechanism of the PLMs.
We propose two token-level sentiment tests: Sentiment Association Test (SAT) and Sentiment Shift Test (SST)
arXiv Detail & Related papers (2022-04-15T02:01:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.