Related papers: Can Large Language Models Capture Public Opinion about Global Warming? An Empirical Assessment of Algorithmic Fidelity and Bias

Can Large Language Models Capture Public Opinion about Global Warming? An Empirical Assessment of Algorithmic Fidelity and Bias

URL: http://arxiv.org/abs/2311.00217v2
Date: Thu, 8 Feb 2024 03:49:46 GMT
Title: Can Large Language Models Capture Public Opinion about Global Warming? An Empirical Assessment of Algorithmic Fidelity and Bias
Authors: S. Lee, T. Q. Peng, M. H. Goldberg, S. A. Rosenthal, J. E. Kotcher, E. W. Maibach and A. Leiserowitz
Abstract summary: Large language models (LLMs) have demonstrated their potential in social science research by emulating human perceptions and behaviors. This study assesses the algorithmic fidelity and bias of LLMs by utilizing two nationally representative climate change surveys.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large language models (LLMs) have demonstrated their potential in social science research by emulating human perceptions and behaviors, a concept referred to as algorithmic fidelity. This study assesses the algorithmic fidelity and bias of LLMs by utilizing two nationally representative climate change surveys. The LLMs were conditioned on demographics and/or psychological covariates to simulate survey responses. The findings indicate that LLMs can effectively capture presidential voting behaviors but encounter challenges in accurately representing global warming perspectives when relevant covariates are not included. GPT-4 exhibits improved performance when conditioned on both demographics and covariates. However, disparities emerge in LLM estimations of the views of certain groups, with LLMs tending to underestimate worry about global warming among Black Americans. While highlighting the potential of LLMs to aid social science research, these results underscore the importance of meticulous conditioning, model selection, survey question format, and bias assessment when employing LLMs for survey simulation. Further investigation into prompt engineering and algorithm auditing is essential to harness the power of LLMs while addressing their inherent limitations.

Related papers

Llms, Virtual Users, and Bias: Predicting Any Survey Question Without Human Data [0.0]
We use Large Language Models (LLMs) to create virtual populations that answer survey questions. We evaluate several LLMs-including GPT-4o, GPT-3.5, Claude 3.5-Sonnet, and versions of the Llama and Mistral models-comparing their performance to that of a traditional Random Forests algorithm.
arXiv Detail & Related papers (2025-03-11T16:27:20Z)
Sometimes the Model doth Preach: Quantifying Religious Bias in Open LLMs through Demographic Analysis in Asian Nations [8.769839351949997]
Large Language Models (LLMs) are capable of generating opinions and propagating bias unknowingly. Our work proposes a novel method that quantitatively analyzes the opinions generated by LLMs. We evaluate modern, open LLMs such as Llama and Mistral on surveys conducted in various global south countries.
arXiv Detail & Related papers (2025-03-10T16:32:03Z)
Examining Alignment of Large Language Models through Representative Heuristics: The Case of Political Stereotypes [20.407518082067437]
This study examines the alignment of large language models (LLMs) with human values for mitigate the domain of politics. We analyze the factors that contribute to LLMs' deviations from empirical positions on political issues. We find that while LLMs can mimic certain political parties' positions, they often exaggerate these positions more than human survey respondents do.
arXiv Detail & Related papers (2025-01-24T07:24:23Z)
Hate Personified: Investigating the role of LLMs in content moderation [64.26243779985393]
For subjective tasks such as hate detection, where people perceive hate differently, the Large Language Model's (LLM) ability to represent diverse groups is unclear. By including additional context in prompts, we analyze LLM's sensitivity to geographical priming, persona attributes, and numerical information to assess how well the needs of various groups are reflected.
arXiv Detail & Related papers (2024-10-03T16:43:17Z)
United in Diversity? Contextual Biases in LLM-Based Predictions of the 2024 European Parliament Elections [45.84205238554709]
Large language models (LLMs) are perceived by some as having the potential to revolutionize social science research. In this study, we examine to what extent LLM-based predictions of public opinion exhibit context-dependent biases. We predict voting behavior in the 2024 European Parliament elections using a state-of-the-art LLM.
arXiv Detail & Related papers (2024-08-29T16:01:06Z)
Vox Populi, Vox AI? Using Language Models to Estimate German Public Opinion [45.84205238554709]
We generate a synthetic sample of personas matching the individual characteristics of the 2017 German Longitudinal Election Study respondents. We ask the LLM GPT-3.5 to predict each respondent's vote choice and compare these predictions to the survey-based estimates. We find that GPT-3.5 does not predict citizens' vote choice accurately, exhibiting a bias towards the Green and Left parties.
arXiv Detail & Related papers (2024-07-11T14:52:18Z)
Unlearning Climate Misinformation in Large Language Models [17.95497650321137]
Misinformation regarding climate change is a key roadblock in addressing one of the most serious threats to humanity. This paper investigates factual accuracy in large language models (LLMs) regarding climate information.
arXiv Detail & Related papers (2024-05-29T23:11:53Z)
DnA-Eval: Enhancing Large Language Model Evaluation through Decomposition and Aggregation [75.81096662788254]
Large Language Models (LLMs) are scalable and economical evaluators. The question of how reliable these evaluators are has emerged as a crucial research question. We propose Decompose and Aggregate, which breaks down the evaluation process into different stages based on pedagogical practices.
arXiv Detail & Related papers (2024-05-24T08:12:30Z)
Explaining Large Language Models Decisions Using Shapley Values [1.223779595809275]
Large language models (LLMs) have opened up exciting possibilities for simulating human behavior and cognitive processes. However, the validity of utilizing LLMs as stand-ins for human subjects remains uncertain. This paper presents a novel approach based on Shapley values to interpret LLM behavior and quantify the relative contribution of each prompt component to the model's output.
arXiv Detail & Related papers (2024-03-29T22:49:43Z)
Exploring Value Biases: How LLMs Deviate Towards the Ideal [57.99044181599786]
Large-Language-Models (LLMs) are deployed in a wide range of applications, and their response has an increasing social impact. We show that value bias is strong in LLMs across different categories, similar to the results found in human studies.
arXiv Detail & Related papers (2024-02-16T18:28:43Z)
Are You Sure? Challenging LLMs Leads to Performance Drops in The FlipFlop Experiment [82.60594940370919]
We propose the FlipFlop experiment to study the multi-turn behavior of Large Language Models (LLMs) We show that models flip their answers on average 46% of the time and that all models see a deterioration of accuracy between their first and final prediction, with an average drop of 17% (the FlipFlop effect) We conduct finetuning experiments on an open-source LLM and find that finetuning on synthetically created data can mitigate - reducing performance deterioration by 60% - but not resolve sycophantic behavior entirely.
arXiv Detail & Related papers (2023-11-14T23:40:22Z)
Do LLMs exhibit human-like response biases? A case study in survey design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all. We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z)
CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations [61.9212914612875]
We present a framework to characterize LLM simulations using four dimensions: Context, Model, Persona, and Topic. We use this framework to measure open-ended LLM simulations' susceptibility to caricature, defined via two criteria: individuation and exaggeration. We find that for GPT-4, simulations of certain demographics (political and marginalized groups) and topics (general, uncontroversial) are highly susceptible to caricature.
arXiv Detail & Related papers (2023-10-17T18:00:25Z)
Assessing Large Language Models on Climate Information [5.034118180129635]
We present a comprehensive evaluation framework grounded in science communication research to assess Large Language Models (LLMs) Our framework emphasizes both presentational responses and adequacy, offering a fine-grained analysis of LLM generations spanning 8 dimensions and 30 issues. We introduce a novel protocol for scalable oversight that relies on AI Assistance and raters with relevant education.
arXiv Detail & Related papers (2023-10-04T16:09:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.