Large language models that replace human participants can harmfully misportray and flatten identity groups
- URL: http://arxiv.org/abs/2402.01908v3
- Date: Mon, 03 Feb 2025 16:28:59 GMT
- Title: Large language models that replace human participants can harmfully misportray and flatten identity groups
- Authors: Angelina Wang, Jamie Morgenstern, John P. Dickerson,
- Abstract summary: We show that there are two inherent limitations in the way current LLMs are trained that prevent this.
We argue analytically for why LLMs are likely to both misportray and flatten the representations of demographic groups.
We also discuss a third limitation about how identity prompts can essentialize identities.
- Score: 36.36009232890876
- License:
- Abstract: Large language models (LLMs) are increasing in capability and popularity, propelling their application in new domains -- including as replacements for human participants in computational social science, user testing, annotation tasks, and more. In many settings, researchers seek to distribute their surveys to a sample of participants that are representative of the underlying human population of interest. This means in order to be a suitable replacement, LLMs will need to be able to capture the influence of positionality (i.e., relevance of social identities like gender and race). However, we show that there are two inherent limitations in the way current LLMs are trained that prevent this. We argue analytically for why LLMs are likely to both misportray and flatten the representations of demographic groups, then empirically show this on 4 LLMs through a series of human studies with 3200 participants across 16 demographic identities. We also discuss a third limitation about how identity prompts can essentialize identities. Throughout, we connect each limitation to a pernicious history of epistemic injustice against the value of lived experiences that explains why replacement is harmful for marginalized demographic groups. Overall, we urge caution in use cases where LLMs are intended to replace human participants whose identities are relevant to the task at hand. At the same time, in cases where the benefits of LLM replacement are determined to outweigh the harms (e.g., the goal is to supplement rather than fully replace, engaging human participants may cause them harm), we provide inference-time techniques that we empirically demonstrate do reduce, but do not remove, these harms.
Related papers
- Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study [23.458234676060716]
This study investigates the algorithmic fidelity of large language models (LLMs)
We prompt different LLMs to generate synthetic public opinions reflective of German subpopulations by incorporating demographic features into the persona prompts.
Our results show that Llama performs better than other LLMs at representing subpopulations, particularly when there is lower opinion diversity within those groups.
arXiv Detail & Related papers (2024-12-17T18:46:32Z) - Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication.
In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness.
Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z) - Hate Personified: Investigating the role of LLMs in content moderation [64.26243779985393]
For subjective tasks such as hate detection, where people perceive hate differently, the Large Language Model's (LLM) ability to represent diverse groups is unclear.
By including additional context in prompts, we analyze LLM's sensitivity to geographical priming, persona attributes, and numerical information to assess how well the needs of various groups are reflected.
arXiv Detail & Related papers (2024-10-03T16:43:17Z) - How Are LLMs Mitigating Stereotyping Harms? Learning from Search Engine Studies [0.0]
Commercial model development has focused efforts on'safety' training concerning legal liabilities at the expense of social impact evaluation.
This mimics a similar trend which we could observe for search engine autocompletion some years prior.
We present a novel evaluation task in the style of autocompletion prompts to assess stereotyping in LLMs.
arXiv Detail & Related papers (2024-07-16T14:04:35Z) - Modeling Human Subjectivity in LLMs Using Explicit and Implicit Human Factors in Personas [14.650234624251716]
Large language models (LLMs) are increasingly being used in human-centered social scientific tasks.
These tasks are highly subjective and dependent on human factors, such as one's environment, attitudes, beliefs, and lived experiences.
We examine the role of prompting LLMs with human-like personas and ask the models to answer as if they were a specific human.
arXiv Detail & Related papers (2024-06-20T16:24:07Z) - How should the advent of large language models affect the practice of
science? [51.62881233954798]
How should the advent of large language models affect the practice of science?
We have invited four diverse groups of scientists to reflect on this query, sharing their perspectives and engaging in debate.
arXiv Detail & Related papers (2023-12-05T10:45:12Z) - Sociodemographic Prompting is Not Yet an Effective Approach for Simulating Subjective Judgments with LLMs [13.744746481528711]
Large Language Models (LLMs) are widely used to simulate human responses across diverse contexts.
We evaluate nine popular LLMs on their ability to understand demographic differences in two subjective judgment tasks: politeness and offensiveness.
We find that in zero-shot settings, most models' predictions for both tasks align more closely with labels from White participants than those from Asian or Black participants.
arXiv Detail & Related papers (2023-11-16T10:02:24Z) - On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented.
Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z) - Do LLMs exhibit human-like response biases? A case study in survey
design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all.
We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires.
Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.