On the steerability of large language models toward data-driven personas
- URL: http://arxiv.org/abs/2311.04978v2
- Date: Tue, 2 Apr 2024 18:29:52 GMT
- Title: On the steerability of large language models toward data-driven personas
- Authors: Junyi Li, Ninareh Mehrabi, Charith Peris, Palash Goyal, Kai-Wei Chang, Aram Galstyan, Richard Zemel, Rahul Gupta,
- Abstract summary: Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented.
Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
- Score: 98.9138902560793
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented. Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs, that can be leveraged to produce multiple perspectives and to reflect the diverse opinions. Moving beyond the traditional reliance on demographics like age, gender, or party affiliation, we introduce a data-driven notion of persona grounded in collaborative filtering, which is defined as either a single individual or a cohort of individuals manifesting similar views across specific inquiries. As individuals in the same demographic group may have different personas, our data-driven persona definition allows for a more nuanced understanding of different (latent) social groups present in the population. In addition to this, we also explore an efficient method to steer LLMs toward the personas that we define. We show that our data-driven personas significantly enhance model steerability, with improvements of between $57\%-77\%$ over our best performing baselines.
Related papers
- ComPO: Community Preferences for Language Model Personalization [122.54846260663922]
ComPO is a method to personalize preference optimization in language models.
We collect and release ComPRed, a question answering dataset with community-level preferences from Reddit.
arXiv Detail & Related papers (2024-10-21T14:02:40Z) - Enriching Datasets with Demographics through Large Language Models: What's in a Name? [5.871504332441324]
Large Language Models (LLMs) can perform as well as, if not better than, bespoke models trained on specialized data.
We apply these LLMs to a variety of datasets, including a real-life, unlabelled dataset of licensed financial professionals in Hong Kong.
arXiv Detail & Related papers (2024-09-17T18:40:49Z) - A Survey on Human Preference Learning for Large Language Models [81.41868485811625]
The recent surge of versatile large language models (LLMs) largely depends on aligning increasingly capable foundation models with human intentions by preference learning.
This survey covers the sources and formats of preference feedback, the modeling and usage of preference signals, as well as the evaluation of the aligned LLMs.
arXiv Detail & Related papers (2024-06-17T03:52:51Z) - Evaluating Large Language Model Biases in Persona-Steered Generation [26.92498998306013]
We show that large language models (LLMs) are 9.7% less steerable towards incongruous personas than congruous ones.
Models that are fine-tuned with Reinforcement Learning from Human Feedback (RLHF) are more steerable, especially towards stances associated with political liberals and women.
arXiv Detail & Related papers (2024-05-30T17:06:03Z) - Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment [84.32768080422349]
Alignment with human preference prevents large language models from generating misleading or toxic content.
We propose a new formulation of prompt diversity, implying a linear correlation with the final performance of LLMs after fine-tuning.
arXiv Detail & Related papers (2024-03-17T07:08:55Z) - MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with
Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.
We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences.
Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z) - Aligning with Whom? Large Language Models Have Gender and Racial Biases
in Subjective NLP Tasks [15.015148115215315]
We conduct experiments on four popular large language models (LLMs) to investigate their capability to understand group differences and potential biases in their predictions for politeness and offensiveness.
We find that for both tasks, model predictions are closer to the labels from White and female participants.
More specifically, when being prompted to respond from the perspective of "Black" and "Asian" individuals, models show lower performance in predicting both overall scores as well as the scores from corresponding groups.
arXiv Detail & Related papers (2023-11-16T10:02:24Z) - Improving Diversity of Demographic Representation in Large Language
Models via Collective-Critiques and Self-Voting [19.79214899011072]
This paper formalizes diversity of representation in generative large language models.
We present evaluation datasets and propose metrics to measure diversity in generated responses along people and culture axes.
We find that LLMs understand the notion of diversity, and that they can reason and critique their own responses for that goal.
arXiv Detail & Related papers (2023-10-25T10:17:17Z) - Toward responsible face datasets: modeling the distribution of a
disentangled latent space for sampling face images from demographic groups [0.0]
Recently, it has been exposed that some modern facial recognition systems could discriminate specific demographic groups.
We propose to use a simple method for modeling and sampling a disentangled projection of a StyleGAN latent space to generate any combination of demographic groups.
Our experiments show that we can synthesis any combination of demographic groups effectively and the identities are different from the original training dataset.
arXiv Detail & Related papers (2023-09-15T14:42:04Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.