Related papers: Are Large Language Models Chameleons?

Related papers

AIn't Nothing But a Survey? Using Large Language Models for Coding German Open-Ended Survey Responses on Survey Motivation [0.8437187555622164]
This study investigates what extent different LLMs can be used to code open-ended survey responses in other contexts.<n>We compare several state-of-the-art LLMs and several prompting approaches, and evaluate the LLMs' performance by using human expert codings.<n>Our study contributes to the growing body of research about the conditions under which LLMs can be efficiently, accurately, and reliably leveraged in survey research.
arXiv Detail & Related papers (2025-06-17T15:28:53Z)
Synthesizing Public Opinions with LLMs: Role Creation, Impacts, and the Future to eDemorcacy [5.92971970173011]
This paper investigates the use of Large Language Models to synthesize public opinion data. It addresses challenges in traditional survey methods like declining response rates and non-response bias. We introduce a novel technique: role creation based on knowledge injection.
arXiv Detail & Related papers (2025-03-31T21:21:52Z)
Evaluating Local and Cloud-Based Large Language Models for Simulating Consumer Choices in Energy Stated Preference Surveys [4.672157041593765]
This study investigates the ability of large language models to simulate consumer choices in energy-related SP surveys. Results indicate that while LLMs achieve an average accuracy of up to 48%, their performance remains insufficient for practical application. Findings suggest that previous SP choices are the most effective input factor, while longer prompts with varied factor formats may reduce accuracy.
arXiv Detail & Related papers (2025-03-07T10:37:31Z)
Disparities in LLM Reasoning Accuracy and Explanations: A Case Study on African American English [66.97110551643722]
We investigate dialectal disparities in Large Language Models (LLMs) reasoning tasks. We find that LLMs produce less accurate responses and simpler reasoning chains and explanations for AAE inputs. These findings highlight systematic differences in how LLMs process and reason about different language varieties.
arXiv Detail & Related papers (2025-03-06T05:15:34Z)
Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations [49.908708778200115]
We are the first to specialize large language models (LLMs) for simulating survey response distributions. As a testbed, we use country-level results from two global cultural surveys. We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions.
arXiv Detail & Related papers (2025-02-10T21:59:27Z)
Cognitive phantoms in LLMs through the lens of latent variables [0.3441021278275805]
Large language models (LLMs) increasingly reach real-world applications, necessitating a better understanding of their behaviour. Recent studies administering psychometric questionnaires to LLMs report human-like traits in LLMs, potentially influencing behaviour. This approach suffers from a validity problem: it presupposes that these traits exist in LLMs and that they are measurable with tools designed for humans. This study investigates this problem by comparing latent structures of personality between humans and three LLMs using two validated personality questionnaires.
arXiv Detail & Related papers (2024-09-06T12:42:35Z)
Are Social Sentiments Inherent in LLMs? An Empirical Study on Extraction of Inter-demographic Sentiments [14.143299702954023]
This study focuses on social groups defined in terms of nationality, religion, and race/ethnicity. We input questions regarding sentiments from one group to another into LLMs, apply sentiment analysis to the responses, and compare the results with social surveys.
arXiv Detail & Related papers (2024-08-08T08:13:25Z)
Social Bias Evaluation for Large Language Models Requires Prompt Variations [38.91306092184724]
Large Language Models (LLMs) exhibit considerable social biases. This paper investigates the sensitivity of LLMs when changing prompt variations. We show that LLMs have tradeoffs between performance and social bias caused by the prompts.
arXiv Detail & Related papers (2024-07-03T14:12:04Z)
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models [61.45529177682614]
We challenge the prevailing constrained evaluation paradigm for values and opinions in large language models. We show that models give substantively different answers when not forced. We distill these findings into recommendations and open challenges in evaluating values and opinions in LLMs.
arXiv Detail & Related papers (2024-02-26T18:00:49Z)
Fine-Grained Self-Endorsement Improves Factuality and Reasoning [72.83651220132495]
This work studies improving large language model (LLM) generations at inference time by mitigating fact-conflicting hallucinations. We propose a self-endorsement framework that leverages the fine-grained fact-level comparisons across multiple sampled responses.
arXiv Detail & Related papers (2024-02-23T22:24:40Z)
Exploring Value Biases: How LLMs Deviate Towards the Ideal [57.99044181599786]
Large-Language-Models (LLMs) are deployed in a wide range of applications, and their response has an increasing social impact. We show that value bias is strong in LLMs across different categories, similar to the results found in human studies.
arXiv Detail & Related papers (2024-02-16T18:28:43Z)
Large Language Models: A Survey [69.72787936480394]
Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data.
arXiv Detail & Related papers (2024-02-09T05:37:09Z)
You don't need a personality test to know these models are unreliable: Assessing the Reliability of Large Language Models on Psychometric Instruments [37.03210795084276]
We examine whether the current format of prompting Large Language Models elicits responses in a consistent and robust manner. Our experiments on 17 different LLMs reveal that even simple perturbations significantly downgrade a model's question-answering ability. Our results suggest that the currently widespread practice of prompting is insufficient to accurately and reliably capture model perceptions.
arXiv Detail & Related papers (2023-11-16T09:50:53Z)
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans. RaR serves as a simple yet effective prompting method for improving performance. We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z)
Do LLMs exhibit human-like response biases? A case study in survey design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all. We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires. Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.