Evaluating Large Language Model Biases in Persona-Steered Generation
- URL: http://arxiv.org/abs/2405.20253v1
- Date: Thu, 30 May 2024 17:06:03 GMT
- Title: Evaluating Large Language Model Biases in Persona-Steered Generation
- Authors: Andy Liu, Mona Diab, Daniel Fried,
- Abstract summary: We show that large language models (LLMs) are 9.7% less steerable towards incongruous personas than congruous ones.
Models that are fine-tuned with Reinforcement Learning from Human Feedback (RLHF) are more steerable, especially towards stances associated with political liberals and women.
- Score: 26.92498998306013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of persona-steered text generation requires large language models (LLMs) to generate text that reflects the distribution of views that an individual fitting a persona could have. People have multifaceted personas, but prior work on bias in LLM-generated opinions has only explored multiple-choice settings or one-dimensional personas. We define an incongruous persona as a persona with multiple traits where one trait makes its other traits less likely in human survey data, e.g. political liberals who support increased military spending. We find that LLMs are 9.7% less steerable towards incongruous personas than congruous ones, sometimes generating the stereotypical stance associated with its demographic rather than the target stance. Models that we evaluate that are fine-tuned with Reinforcement Learning from Human Feedback (RLHF) are more steerable, especially towards stances associated with political liberals and women, but present significantly less diverse views of personas. We also find variance in LLM steerability that cannot be predicted from multiple-choice opinion evaluation. Our results show the importance of evaluating models in open-ended text generation, as it can surface new LLM opinion biases. Moreover, such a setup can shed light on our ability to steer models toward a richer and more diverse range of viewpoints.
Related papers
- Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment [84.32768080422349]
Alignment with human preference prevents large language models from generating misleading or toxic content.
We propose a new formulation of prompt diversity, implying a linear correlation with the final performance of LLMs after fine-tuning.
arXiv Detail & Related papers (2024-03-17T07:08:55Z) - Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models [61.45529177682614]
We challenge the prevailing constrained evaluation paradigm for values and opinions in large language models.
We show that models give substantively different answers when not forced.
We distill these findings into recommendations and open challenges in evaluating values and opinions in LLMs.
arXiv Detail & Related papers (2024-02-26T18:00:49Z) - Quantifying the Persona Effect in LLM Simulations [25.367927300697424]
Large language models (LLMs) have shown remarkable promise in simulating human language and behavior.
This study investigates how integrating persona variables-demographic, social, and behavioral factors-impacts LLMs' ability to simulate diverse perspectives.
We find that persona variables account for 10% variance in annotations in existing subjective NLP datasets.
arXiv Detail & Related papers (2024-02-16T16:35:35Z) - Aligning Large Language Models with Human Opinions through Persona Selection and Value--Belief--Norm Reasoning [67.33899440998175]
Chain-of-Opinion (COO) is a simple four-step solution modeling which and how to reason with personae.
COO distinguishes between explicit personae (demographics and ideology) and implicit personae (historical opinions)
COO efficiently achieves new state-of-the-art opinion prediction via prompting with only 5 inference calls, improving prior techniques by up to 4%.
arXiv Detail & Related papers (2023-11-14T18:48:27Z) - On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented.
Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z) - Do LLMs exhibit human-like response biases? A case study in survey
design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all.
We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires.
Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z) - Verbosity Bias in Preference Labeling by Large Language Models [10.242500241407466]
We examine the biases that come along with evaluating Large Language Models (LLMs)
We take a closer look into verbosity bias -- a bias where LLMs sometimes prefer more verbose answers even if they have similar qualities.
arXiv Detail & Related papers (2023-10-16T05:19:02Z) - Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models [0.0]
This paper investigates bias along less-studied but still consequential, dimensions, such as age and beauty.
We ask whether LLMs hold wide-reaching biases of positive or negative sentiment for specific social groups similar to the "what is beautiful is good" bias found in people in experimental psychology.
arXiv Detail & Related papers (2023-09-16T07:07:04Z) - Whose Opinions Do Language Models Reflect? [88.35520051971538]
We investigate the opinions reflected by language models (LMs) by leveraging high-quality public opinion polls and their associated human responses.
We find substantial misalignment between the views reflected by current LMs and those of US demographic groups.
Our analysis confirms prior observations about the left-leaning tendencies of some human feedback-tuned LMs.
arXiv Detail & Related papers (2023-03-30T17:17:08Z) - Fine-tuning language models to find agreement among humans with diverse
preferences [7.702628192754256]
Recent work in large language modeling (LLMs) has used fine-tuning to align outputs with the preferences of a prototypical user.
Here, we consider how might a machine help people with diverse views find agreement?
We fine-tune a 70 billion parameter LLM to generate statements that maximize the expected approval for a group of people with potentially diverse opinions.
We find that when we silently constructed consensus statements from only a subset of group members, those who were excluded were more likely to dissent.
arXiv Detail & Related papers (2022-11-28T02:24:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.