One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversity
- URL: http://arxiv.org/abs/2411.04427v2
- Date: Tue, 12 Nov 2024 20:11:58 GMT
- Title: One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversity
- Authors: Sonia K. Murthy, Tomer Ullman, Jennifer Hu,
- Abstract summary: Researchers have proposed using large language models (LLMs) as replacements for humans in behavioral research.
It is debated whether post-training alignment (RLHF or RLAIF) affects models' internal diversity.
We use a new way of measuring the conceptual diversity of synthetically-generated LLM "populations" by relating the internal variability of simulated individuals to the population-level variability.
- Score: 2.5975241792179378
- License:
- Abstract: Researchers in social science and psychology have recently proposed using large language models (LLMs) as replacements for humans in behavioral research. In addition to arguments about whether LLMs accurately capture population-level patterns, this has raised questions about whether LLMs capture human-like conceptual diversity. Separately, it is debated whether post-training alignment (RLHF or RLAIF) affects models' internal diversity. Inspired by human studies, we use a new way of measuring the conceptual diversity of synthetically-generated LLM "populations" by relating the internal variability of simulated individuals to the population-level variability. We use this approach to evaluate non-aligned and aligned LLMs on two domains with rich human behavioral data. While no model reaches human-like diversity, aligned models generally display less diversity than their instruction fine-tuned counterparts. Our findings highlight potential trade-offs between increasing models' value alignment and decreasing the diversity of their conceptual representations.
Related papers
- Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models [36.983534612895156]
In the recent past, a popular way of evaluating natural language understanding (NLU) was to consider a model's ability to perform natural language inference (NLI) tasks.
This paper focuses on five different NLI benchmarks across six models of different scales.
We investigate if they are able to discriminate models of different size and quality and how their accuracies develop during training.
arXiv Detail & Related papers (2024-11-21T13:09:36Z) - Large Language Models Reflect the Ideology of their Creators [73.25935570218375]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.
We uncover notable diversity in the ideological stance exhibited across different LLMs and languages.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - Virtual Personas for Language Models via an Anthology of Backstories [5.2112564466740245]
"Anthology" is a method for conditioning large language models to particular virtual personas by harnessing open-ended life narratives.
We show that our methodology enhances the consistency and reliability of experimental outcomes while ensuring better representation of diverse sub-populations.
arXiv Detail & Related papers (2024-07-09T06:11:18Z) - High-Dimension Human Value Representation in Large Language Models [60.33033114185092]
We propose UniVaR, a high-dimensional representation of human value distributions in Large Language Models (LLMs)
We show that UniVaR is a powerful tool to compare the distribution of human values embedded in different LLMs with different langauge sources.
arXiv Detail & Related papers (2024-04-11T16:39:00Z) - Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment [84.32768080422349]
Alignment with human preference prevents large language models from generating misleading or toxic content.
We propose a new formulation of prompt diversity, implying a linear correlation with the final performance of LLMs after fine-tuning.
arXiv Detail & Related papers (2024-03-17T07:08:55Z) - On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented.
Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z) - Do LLMs exhibit human-like response biases? A case study in survey
design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all.
We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires.
Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z) - Improving Diversity of Demographic Representation in Large Language
Models via Collective-Critiques and Self-Voting [19.79214899011072]
This paper formalizes diversity of representation in generative large language models.
We present evaluation datasets and propose metrics to measure diversity in generated responses along people and culture axes.
We find that LLMs understand the notion of diversity, and that they can reason and critique their own responses for that goal.
arXiv Detail & Related papers (2023-10-25T10:17:17Z) - Large Language Models as Superpositions of Cultural Perspectives [25.114678091641935]
Large Language Models (LLMs) are often misleadingly recognized as having a personality or a set of values.
We argue that an LLM can be seen as a superposition of perspectives with different values and personality traits.
arXiv Detail & Related papers (2023-07-15T19:04:33Z) - Source-free Domain Adaptation Requires Penalized Diversity [60.04618512479438]
Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data.
In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor.
We propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors.
arXiv Detail & Related papers (2023-04-06T00:20:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.