OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs
- URL: http://arxiv.org/abs/2309.03876v1
- Date: Thu, 7 Sep 2023 17:41:01 GMT
- Title: OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs
- Authors: Patrick Haller, Ansar Aynetdinov, Alan Akbik
- Abstract summary: We present OpinionGPT, a web demo in which users can ask questions and select all biases they wish to investigate.
The demo will answer this question using a model fine-tuned on text representing each of the selected biases.
To train the underlying model, we identified 11 different biases (political, geographic, gender, age) and derived an instruction-tuning corpus in which each answer was written by members of one of these demographics.
- Score: 3.5342505775640247
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instruction-tuned Large Language Models (LLMs) have recently showcased
remarkable ability to generate fitting responses to natural language
instructions. However, an open research question concerns the inherent biases
of trained models and their responses. For instance, if the data used to tune
an LLM is dominantly written by persons with a specific political bias, we
might expect generated answers to share this bias. Current research work seeks
to de-bias such models, or suppress potentially biased answers. With this
demonstration, we take a different view on biases in instruction-tuning: Rather
than aiming to suppress them, we aim to make them explicit and transparent. To
this end, we present OpinionGPT, a web demo in which users can ask questions
and select all biases they wish to investigate. The demo will answer this
question using a model fine-tuned on text representing each of the selected
biases, allowing side-by-side comparison. To train the underlying model, we
identified 11 different biases (political, geographic, gender, age) and derived
an instruction-tuning corpus in which each answer was written by members of one
of these demographics. This paper presents OpinionGPT, illustrates how we
trained the bias-aware model and showcases the web application (available at
https://opiniongpt.informatik.hu-berlin.de).
Related papers
- From Lists to Emojis: How Format Bias Affects Model Alignment [67.08430328350327]
We study format biases in reinforcement learning from human feedback.
Many widely-used preference models, including human evaluators, exhibit strong biases towards specific format patterns.
We show that with a small amount of biased data, we can inject significant bias into the reward model.
arXiv Detail & Related papers (2024-09-18T05:13:18Z) - Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - Evaluating Nuanced Bias in Large Language Model Free Response Answers [8.775925011558995]
We identify several kinds of nuanced bias in free text that cannot be identified by multiple choice tests.
We present a semi-automated pipeline for detecting these types of bias by first eliminating answers that can be automatically classified as unbiased.
arXiv Detail & Related papers (2024-07-11T19:58:13Z) - Quantifying Bias in Text-to-Image Generative Models [49.60774626839712]
Bias in text-to-image (T2I) models can propagate unfair social representations and may be used to aggressively market ideas or push controversial agendas.
Existing T2I model bias evaluation methods only focus on social biases.
We propose an evaluation methodology to quantify general biases in T2I generative models, without any preconceived notions.
arXiv Detail & Related papers (2023-12-20T14:26:54Z) - GPTBIAS: A Comprehensive Framework for Evaluating Bias in Large Language
Models [83.30078426829627]
Large language models (LLMs) have gained popularity and are being widely adopted by a large user community.
The existing evaluation methods have many constraints, and their results exhibit a limited degree of interpretability.
We propose a bias evaluation framework named GPTBIAS that leverages the high performance of LLMs to assess bias in models.
arXiv Detail & Related papers (2023-12-11T12:02:14Z) - What Do Llamas Really Think? Revealing Preference Biases in Language
Model Representations [62.91799637259657]
Do large language models (LLMs) exhibit sociodemographic biases, even when they decline to respond?
We study this research question by probing contextualized embeddings and exploring whether this bias is encoded in its latent representations.
We propose a logistic Bradley-Terry probe which predicts word pair preferences of LLMs from the words' hidden vectors.
arXiv Detail & Related papers (2023-11-30T18:53:13Z) - It's All Relative: Interpretable Models for Scoring Bias in Documents [10.678219157857946]
We propose an interpretable model to score the bias present in web documents, based only on their textual content.
Our model incorporates assumptions reminiscent of the Bradley-Terry axioms and is trained on pairs of revisions of the same Wikipedia article.
We show that we can interpret the parameters of the trained model to discover the words most indicative of bias.
arXiv Detail & Related papers (2023-07-16T19:35:38Z) - Soft-prompt Tuning for Large Language Models to Evaluate Bias [0.03141085922386211]
Using soft-prompts to evaluate bias gives us the extra advantage of avoiding the human-bias injection.
We check the model biases on different sensitive attributes using the group fairness (bias) and find interesting bias patterns.
arXiv Detail & Related papers (2023-06-07T19:11:25Z) - Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias [33.99768156365231]
We introduce a causal formulation for bias measurement in generative language models.
We propose a benchmark called OccuGender, with a bias-measuring procedure to investigate occupational gender bias.
The results show that these models exhibit substantial occupational gender bias.
arXiv Detail & Related papers (2022-12-20T22:41:24Z) - UnQovering Stereotyping Biases via Underspecified Questions [68.81749777034409]
We present UNQOVER, a framework to probe and quantify biases through underspecified questions.
We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors.
We use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion.
arXiv Detail & Related papers (2020-10-06T01:49:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.