Related papers: Evaluating Large Language Model Biases in Persona-Steered Generation

Evaluating Large Language Model Biases in Persona-Steered Generation

URL: http://arxiv.org/abs/2405.20253v1
Date: Thu, 30 May 2024 17:06:03 GMT
Title: Evaluating Large Language Model Biases in Persona-Steered Generation
Authors: Andy Liu, Mona Diab, Daniel Fried,
Abstract summary: We show that large language models (LLMs) are 9.7% less steerable towards incongruous personas than congruous ones. Models that are fine-tuned with Reinforcement Learning from Human Feedback (RLHF) are more steerable, especially towards stances associated with political liberals and women.
Score: 26.92498998306013
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The task of persona-steered text generation requires large language models (LLMs) to generate text that reflects the distribution of views that an individual fitting a persona could have. People have multifaceted personas, but prior work on bias in LLM-generated opinions has only explored multiple-choice settings or one-dimensional personas. We define an incongruous persona as a persona with multiple traits where one trait makes its other traits less likely in human survey data, e.g. political liberals who support increased military spending. We find that LLMs are 9.7% less steerable towards incongruous personas than congruous ones, sometimes generating the stereotypical stance associated with its demographic rather than the target stance. Models that we evaluate that are fine-tuned with Reinforcement Learning from Human Feedback (RLHF) are more steerable, especially towards stances associated with political liberals and women, but present significantly less diverse views of personas. We also find variance in LLM steerability that cannot be predicted from multiple-choice opinion evaluation. Our results show the importance of evaluating models in open-ended text generation, as it can surface new LLM opinion biases. Moreover, such a setup can shed light on our ability to steer models toward a richer and more diverse range of viewpoints.

Related papers

Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [49.41113560646115]
We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
arXiv Detail & Related papers (2025-06-12T08:47:40Z)
Higher-Order Binding of Language Model Virtual Personas: a Study on Approximating Political Partisan Misperceptions [4.234771450043289]
Large language models (LLMs) are increasingly capable of simulating human behavior. We propose a novel methodology for constructing virtual personas with synthetic user backstories" generated as extended, multi-turn interview transcripts. Our generated backstories are longer, rich in detail, and consistent in authentically describing a singular individual, compared to previous methods.
arXiv Detail & Related papers (2025-04-16T00:10:34Z)
Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study [23.458234676060716]
This study investigates the algorithmic fidelity of large language models (LLMs) We prompt different LLMs to generate synthetic public opinions reflective of German subpopulations by incorporating demographic features into the persona prompts. Our results show that Llama performs better than other LLMs at representing subpopulations, particularly when there is lower opinion diversity within those groups.
arXiv Detail & Related papers (2024-12-17T18:46:32Z)
Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment [84.32768080422349]
Alignment with human preference prevents large language models from generating misleading or toxic content. We propose a new formulation of prompt diversity, implying a linear correlation with the final performance of LLMs after fine-tuning.
arXiv Detail & Related papers (2024-03-17T07:08:55Z)
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models [61.45529177682614]
We challenge the prevailing constrained evaluation paradigm for values and opinions in large language models. We show that models give substantively different answers when not forced. We distill these findings into recommendations and open challenges in evaluating values and opinions in LLMs.
arXiv Detail & Related papers (2024-02-26T18:00:49Z)
Quantifying the Persona Effect in LLM Simulations [25.367927300697424]
Large language models (LLMs) have shown remarkable promise in simulating human language and behavior. This study investigates how integrating persona variables-demographic, social, and behavioral factors-impacts LLMs' ability to simulate diverse perspectives. We find that persona variables account for 10% variance in annotations in existing subjective NLP datasets.
arXiv Detail & Related papers (2024-02-16T16:35:35Z)
Sociodemographic Prompting is Not Yet an Effective Approach for Simulating Subjective Judgments with LLMs [13.744746481528711]
Large Language Models (LLMs) are widely used to simulate human responses across diverse contexts. We evaluate nine popular LLMs on their ability to understand demographic differences in two subjective judgment tasks: politeness and offensiveness. We find that in zero-shot settings, most models' predictions for both tasks align more closely with labels from White participants than those from Asian or Black participants.
arXiv Detail & Related papers (2023-11-16T10:02:24Z)
Aligning Large Language Models with Human Opinions through Persona Selection and Value--Belief--Norm Reasoning [67.33899440998175]
Chain-of-Opinion (COO) is a simple four-step solution modeling which and how to reason with personae. COO distinguishes between explicit personae (demographics and ideology) and implicit personae (historical opinions) COO efficiently achieves new state-of-the-art opinion prediction via prompting with only 5 inference calls, improving prior techniques by up to 4%.
arXiv Detail & Related papers (2023-11-14T18:48:27Z)
On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented. Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z)
Do LLMs exhibit human-like response biases? A case study in survey design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all. We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z)
Verbosity Bias in Preference Labeling by Large Language Models [10.242500241407466]
We examine the biases that come along with evaluating Large Language Models (LLMs) We take a closer look into verbosity bias -- a bias where LLMs sometimes prefer more verbose answers even if they have similar qualities.
arXiv Detail & Related papers (2023-10-16T05:19:02Z)
Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models [0.0]
This paper investigates bias along less-studied but still consequential, dimensions, such as age and beauty. We ask whether LLMs hold wide-reaching biases of positive or negative sentiment for specific social groups similar to the "what is beautiful is good" bias found in people in experimental psychology.
arXiv Detail & Related papers (2023-09-16T07:07:04Z)
Whose Opinions Do Language Models Reflect? [88.35520051971538]
We investigate the opinions reflected by language models (LMs) by leveraging high-quality public opinion polls and their associated human responses. We find substantial misalignment between the views reflected by current LMs and those of US demographic groups. Our analysis confirms prior observations about the left-leaning tendencies of some human feedback-tuned LMs.
arXiv Detail & Related papers (2023-03-30T17:17:08Z)
Fine-tuning language models to find agreement among humans with diverse preferences [7.702628192754256]
Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user. Here, we consider how might a machine help people with diverse views find agreement? We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions. We find that when we silently constructed consensus statements from only a subset of group members, those who were excluded were more likely to dissent.
arXiv Detail & Related papers (2022-11-28T02:24:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.