Can Large Language Models Capture Public Opinion about Global Warming?
An Empirical Assessment of Algorithmic Fidelity and Bias
- URL: http://arxiv.org/abs/2311.00217v2
- Date: Thu, 8 Feb 2024 03:49:46 GMT
- Title: Can Large Language Models Capture Public Opinion about Global Warming?
An Empirical Assessment of Algorithmic Fidelity and Bias
- Authors: S. Lee, T. Q. Peng, M. H. Goldberg, S. A. Rosenthal, J. E. Kotcher, E.
W. Maibach and A. Leiserowitz
- Abstract summary: Large language models (LLMs) have demonstrated their potential in social science research by emulating human perceptions and behaviors.
This study assesses the algorithmic fidelity and bias of LLMs by utilizing two nationally representative climate change surveys.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large language models (LLMs) have demonstrated their potential in social
science research by emulating human perceptions and behaviors, a concept
referred to as algorithmic fidelity. This study assesses the algorithmic
fidelity and bias of LLMs by utilizing two nationally representative climate
change surveys. The LLMs were conditioned on demographics and/or psychological
covariates to simulate survey responses. The findings indicate that LLMs can
effectively capture presidential voting behaviors but encounter challenges in
accurately representing global warming perspectives when relevant covariates
are not included. GPT-4 exhibits improved performance when conditioned on both
demographics and covariates. However, disparities emerge in LLM estimations of
the views of certain groups, with LLMs tending to underestimate worry about
global warming among Black Americans. While highlighting the potential of LLMs
to aid social science research, these results underscore the importance of
meticulous conditioning, model selection, survey question format, and bias
assessment when employing LLMs for survey simulation. Further investigation
into prompt engineering and algorithm auditing is essential to harness the
power of LLMs while addressing their inherent limitations.
Related papers
- Vox Populi, Vox AI? Using Language Models to Estimate German Public Opinion [45.84205238554709]
We generate a synthetic sample of personas matching the individual characteristics of the 2017 German Longitudinal Election Study respondents.
We ask the LLM GPT-3.5 to predict each respondent's vote choice and compare these predictions to the survey-based estimates.
We find that GPT-3.5 does not predict citizens' vote choice accurately, exhibiting a bias towards the Green and Left parties.
arXiv Detail & Related papers (2024-07-11T14:52:18Z) - Unlearning Climate Misinformation in Large Language Models [17.95497650321137]
Misinformation regarding climate change is a key roadblock in addressing one of the most serious threats to humanity.
This paper investigates factual accuracy in large language models (LLMs) regarding climate information.
arXiv Detail & Related papers (2024-05-29T23:11:53Z) - Decompose and Aggregate: A Step-by-Step Interpretable Evaluation Framework [75.81096662788254]
Large Language Models (LLMs) are scalable and economical evaluators.
The question of how reliable these evaluators are has emerged as a crucial research question.
We propose Decompose and Aggregate, which breaks down the evaluation process into different stages based on pedagogical practices.
arXiv Detail & Related papers (2024-05-24T08:12:30Z) - Wait, It's All Token Noise? Always Has Been: Interpreting LLM Behavior Using Shapley Value [1.223779595809275]
Large language models (LLMs) have opened up exciting possibilities for simulating human behavior and cognitive processes.
However, the validity of utilizing LLMs as stand-ins for human subjects remains uncertain.
This paper presents a novel approach based on Shapley values to interpret LLM behavior and quantify the relative contribution of each prompt component to the model's output.
arXiv Detail & Related papers (2024-03-29T22:49:43Z) - Exploring Value Biases: How LLMs Deviate Towards the Ideal [57.99044181599786]
Large-Language-Models (LLMs) are deployed in a wide range of applications, and their response has an increasing social impact.
We show that value bias is strong in LLMs across different categories, similar to the results found in human studies.
arXiv Detail & Related papers (2024-02-16T18:28:43Z) - Are You Sure? Challenging LLMs Leads to Performance Drops in The
FlipFlop Experiment [82.60594940370919]
We propose the FlipFlop experiment to study the multi-turn behavior of Large Language Models (LLMs)
We show that models flip their answers on average 46% of the time and that all models see a deterioration of accuracy between their first and final prediction, with an average drop of 17% (the FlipFlop effect)
We conduct finetuning experiments on an open-source LLM and find that finetuning on synthetically created data can mitigate - reducing performance deterioration by 60% - but not resolve sycophantic behavior entirely.
arXiv Detail & Related papers (2023-11-14T23:40:22Z) - Do LLMs exhibit human-like response biases? A case study in survey
design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all.
We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires.
Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z) - CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations [61.9212914612875]
We present a framework to characterize LLM simulations using four dimensions: Context, Model, Persona, and Topic.
We use this framework to measure open-ended LLM simulations' susceptibility to caricature, defined via two criteria: individuation and exaggeration.
We find that for GPT-4, simulations of certain demographics (political and marginalized groups) and topics (general, uncontroversial) are highly susceptible to caricature.
arXiv Detail & Related papers (2023-10-17T18:00:25Z) - Assessing Large Language Models on Climate Information [5.034118180129635]
We present a comprehensive evaluation framework grounded in science communication research to assess Large Language Models (LLMs)
Our framework emphasizes both presentational responses and adequacy, offering a fine-grained analysis of LLM generations spanning 8 dimensions and 30 issues.
We introduce a novel protocol for scalable oversight that relies on AI Assistance and raters with relevant education.
arXiv Detail & Related papers (2023-10-04T16:09:48Z) - Framework-Based Qualitative Analysis of Free Responses of Large Language
Models: Algorithmic Fidelity [1.7947441434255664]
Large-scale generative Language Models (LLMs) can simulate free responses to interview questions like those traditionally analyzed using qualitative research methods.
Here we consider whether artificial "silicon participants" generated by LLMs may be productively studied using qualitative methods.
arXiv Detail & Related papers (2023-09-06T15:00:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.