Related papers: Evaluating the Bias in LLMs for Surveying Opinion and Decision Making in Healthcare

Evaluating the Bias in LLMs for Surveying Opinion and Decision Making in Healthcare

URL: http://arxiv.org/abs/2504.08260v2
Date: Thu, 17 Apr 2025 01:36:57 GMT
Title: Evaluating the Bias in LLMs for Surveying Opinion and Decision Making in Healthcare
Authors: Yonchanok Khaokaew, Flora D. Salim, Andreas Züfle, Hao Xue, Taylor Anderson, C. Raina MacIntyre, Matthew Scotch, David J Heslop,
Abstract summary: Generative agents have been increasingly used to simulate human behaviour in silico, driven by large language models (LLMs)<n>This work compares survey data from the Understanding America Study (UAS) on healthcare decision-making with simulated responses from generative agents.<n>Using demographic-based prompt engineering, we create digital twins of survey respondents and analyse how well different LLMs reproduce real-world behaviours.
Score: 7.075750841525739
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative agents have been increasingly used to simulate human behaviour in silico, driven by large language models (LLMs). These simulacra serve as sandboxes for studying human behaviour without compromising privacy or safety. However, it remains unclear whether such agents can truly represent real individuals. This work compares survey data from the Understanding America Study (UAS) on healthcare decision-making with simulated responses from generative agents. Using demographic-based prompt engineering, we create digital twins of survey respondents and analyse how well different LLMs reproduce real-world behaviours. Our findings show that some LLMs fail to reflect realistic decision-making, such as predicting universal vaccine acceptance. However, Llama 3 captures variations across race and Income more accurately but also introduces biases not present in the UAS data. This study highlights the potential of generative agents for behavioural research while underscoring the risks of bias from both LLMs and prompting strategies.

Related papers

Addressing Bias in LLMs: Strategies and Application to Fair AI-based Recruitment [49.81946749379338]
This work seeks to analyze the capacity of Transformers-based systems to learn demographic biases present in the data.<n>We propose a privacy-enhancing framework to reduce gender information from the learning pipeline as a way to mitigate biased behaviors in the final tools.
arXiv Detail & Related papers (2025-06-13T15:29:43Z)
Can Generative AI agents behave like humans? Evidence from laboratory market experiments [0.0]
We explore the potential of Large Language Models to replicate human behavior in economic market experiments.<n>We compare LLM behavior to market dynamics observed in laboratory settings and assess their alignment with human participants' behavior.<n>These results suggest that LLMs hold promise as tools for simulating realistic human behavior in economic contexts.
arXiv Detail & Related papers (2025-05-12T11:44:46Z)
Prompting is Not All You Need! Evaluating LLM Agent Simulation Methodologies with Real-World Online Customer Behavior Data [62.61900377170456]
We focus on evaluating LLM's objective accuracy'' rather than the subjective believability'' in simulating human behavior.<n>We present the first comprehensive evaluation of state-of-the-art LLMs on the task of web shopping action generation.
arXiv Detail & Related papers (2025-03-26T17:33:27Z)
Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy [38.63235613382905]
We introduce the VacSim framework with 100 generative agents powered by Large Language Models (LLMs) VacSim vaccine simulates policy outcomes with the following steps: 1) instantiate a population of agents with demographics based on census data; 2) connect the agents via a social network and model vaccine attitudes as a function of social dynamics and disease-related information; 3) design and evaluate various public health interventions aimed at mitigating vaccine hesitancy.
arXiv Detail & Related papers (2025-03-12T02:54:15Z)
Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations [49.908708778200115]
We are the first to specialize large language models (LLMs) for simulating survey response distributions. As a testbed, we use country-level results from two global cultural surveys. We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions.
arXiv Detail & Related papers (2025-02-10T21:59:27Z)
Generative Agent Simulations of 1,000 People [56.82159813294894]
We present a novel agent architecture that simulates the attitudes and behaviors of 1,052 real individuals. The generative agents replicate participants' responses on the General Social Survey 85% as accurately as participants replicate their own answers. Our architecture reduces accuracy biases across racial and ideological groups compared to agents given demographic descriptions.
arXiv Detail & Related papers (2024-11-15T11:14:34Z)
Persuasion with Large Language Models: a Survey [49.86930318312291]
Large Language Models (LLMs) have created new disruptive possibilities for persuasive communication. In areas such as politics, marketing, public health, e-commerce, and charitable giving, such LLM Systems have already achieved human-level or even super-human persuasiveness. Our survey suggests that the current and future potential of LLM-based persuasion poses profound ethical and societal risks.
arXiv Detail & Related papers (2024-11-11T10:05:52Z)
Towards Implicit Bias Detection and Mitigation in Multi-Agent LLM Interactions [25.809599403713506]
Large Language Models (LLMs) are increasingly being employed in numerous studies to simulate societies and execute diverse social tasks. LLMs are susceptible to societal biases due to their exposure to human-generated data. This study investigates the presence of implicit gender biases in multi-agent LLM interactions and proposes two strategies to mitigate these biases.
arXiv Detail & Related papers (2024-10-03T15:28:05Z)
United in Diversity? Contextual Biases in LLM-Based Predictions of the 2024 European Parliament Elections [42.72938925647165]
"Synthetic samples" based on large language models (LLMs) have been argued to serve as efficient alternatives to surveys of humans. "Synthetic samples" might exhibit bias due to training data and fine-tuning processes being unrepresentative of diverse contexts. This study investigates if and under which conditions LLM-generated synthetic samples can be used for public opinion prediction.
arXiv Detail & Related papers (2024-08-29T16:01:06Z)
Vox Populi, Vox AI? Using Language Models to Estimate German Public Opinion [45.84205238554709]
We generate a synthetic sample of personas matching the individual characteristics of the 2017 German Longitudinal Election Study respondents. We ask the LLM GPT-3.5 to predict each respondent's vote choice and compare these predictions to the survey-based estimates. We find that GPT-3.5 does not predict citizens' vote choice accurately, exhibiting a bias towards the Green and Left parties.
arXiv Detail & Related papers (2024-07-11T14:52:18Z)
Exploring Value Biases: How LLMs Deviate Towards the Ideal [57.99044181599786]
Large-Language-Models (LLMs) are deployed in a wide range of applications, and their response has an increasing social impact. We show that value bias is strong in LLMs across different categories, similar to the results found in human studies.
arXiv Detail & Related papers (2024-02-16T18:28:43Z)
Systematic Biases in LLM Simulations of Debates [12.933509143906141]
We study the limitations of Large Language Models in simulating human interactions. Our findings indicate a tendency for LLM agents to conform to the model's inherent social biases. These results underscore the need for further research to develop methods that help agents overcome these biases.
arXiv Detail & Related papers (2024-02-06T14:51:55Z)
Do LLMs exhibit human-like response biases? A case study in survey design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all. We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.