Related papers: Can Large Language Models Capture Human Risk Preferences? A Cross-Cultural Study

Can Large Language Models Capture Human Risk Preferences? A Cross-Cultural Study

URL: http://arxiv.org/abs/2506.23107v1
Date: Sun, 29 Jun 2025 06:16:57 GMT
Title: Can Large Language Models Capture Human Risk Preferences? A Cross-Cultural Study
Authors: Bing Song, Jianing Liu, Sisi Jian, Chenyang Wu, Vinayak Dixit,
Abstract summary: This study investigates the ability of large language models to simulate risky decision-making scenarios.<n>We compare model-generated decisions with actual human responses in a series of lottery-based tasks.<n>Results show that both models exhibit more risk-averse behavior than human participants.
Score: 2.227470703746655
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have made significant strides, extending their applications to dialogue systems, automated content creation, and domain-specific advisory tasks. However, as their use grows, concerns have emerged regarding their reliability in simulating complex decision-making behavior, such as risky decision-making, where a single choice can lead to multiple outcomes. This study investigates the ability of LLMs to simulate risky decision-making scenarios. We compare model-generated decisions with actual human responses in a series of lottery-based tasks, using transportation stated preference survey data from participants in Sydney, Dhaka, Hong Kong, and Nanjing. Demographic inputs were provided to two LLMs -- ChatGPT 4o and ChatGPT o1-mini -- which were tasked with predicting individual choices. Risk preferences were analyzed using the Constant Relative Risk Aversion (CRRA) framework. Results show that both models exhibit more risk-averse behavior than human participants, with o1-mini aligning more closely with observed human decisions. Further analysis of multilingual data from Nanjing and Hong Kong indicates that model predictions in Chinese deviate more from actual responses compared to English, suggesting that prompt language may influence simulation performance. These findings highlight both the promise and the current limitations of LLMs in replicating human-like risk behavior, particularly in linguistic and cultural settings.

Related papers

Evaluating Prompt-Driven Chinese Large Language Models: The Influence of Persona Assignment on Stereotypes and Safeguards [3.1308581258317485]
We analyze how persona assignment influences refusal behavior and response toxicity in Qwen, a widely-used Chinese language model.<n>Our study reveals significant gender biases in refusal rates and demonstrates that certain negative personas can amplify toxicity toward Chinese social groups by up to 60-fold.<n>To mitigate this toxicity, we propose an innovative multi-model feedback strategy, employing iterative interactions between Qwen and an external evaluator.
arXiv Detail & Related papers (2025-06-05T12:47:21Z)
Language-Agnostic Suicidal Risk Detection Using Large Language Models [9.90722058486037]
This study introduces a novel language-agnostic framework for suicidal risk assessment with large language models (LLMs)<n>We generate Chinese transcripts from speech using an ASR model and then employ LLMs with prompt-based queries to extract suicidal risk-related features from these transcripts.<n> Experimental results show that our method performance comparable to direct fine-tuning with ASR results or to models trained solely on Chinese suicidal risk-related features.
arXiv Detail & Related papers (2025-05-26T15:12:10Z)
Mapping Geopolitical Bias in 11 Large Language Models: A Bilingual, Dual-Framing Analysis of U.S.-China Tensions [2.8202443616982884]
This study systematically analyzes geopolitical bias across 11 prominent Large Language Models (LLMs)<n>We generated 19,712 prompts designed to detect ideological leanings in model outputs.<n>U.S.-based models predominantly favored Pro-U.S. stances, while Chinese-origin models exhibited pronounced Pro-China biases.
arXiv Detail & Related papers (2025-03-31T03:38:17Z)
Predicting Human Choice Between Textually Described Lotteries [0.0]
This study conducts the first large-scale exploration of human decision-making in such tasks.<n>We evaluate multiple computational approaches, including fine-tuning Large Language Models.
arXiv Detail & Related papers (2025-03-18T08:10:33Z)
Specializing Large Language Models to Simulate Survey Response Distributions for Global Populations [49.908708778200115]
We are the first to specialize large language models (LLMs) for simulating survey response distributions.<n>As a testbed, we use country-level results from two global cultural surveys.<n>We devise a fine-tuning method based on first-token probabilities to minimize divergence between predicted and actual response distributions.
arXiv Detail & Related papers (2025-02-10T21:59:27Z)
Defining and Evaluating Decision and Composite Risk in Language Models Applied to Natural Language Inference [3.422309388045878]
Large language models (LLMs) such as ChatGPT are known to pose important risks. misplaced confidence arises from over-confidence or under-confidence, that the models have in their inference. We propose an experimental framework consisting of a two-level inference architecture and appropriate metrics for measuring such risks.
arXiv Detail & Related papers (2024-08-04T05:24:32Z)
Language Model Alignment in Multilingual Trolley Problems [138.5684081822807]
Building on the Moral Machine experiment, we develop a cross-lingual corpus of moral dilemma vignettes in over 100 languages called MultiTP.<n>Our analysis explores the alignment of 19 different LLMs with human judgments, capturing preferences across six moral dimensions.<n>We discover significant variance in alignment across languages, challenging the assumption of uniform moral reasoning in AI systems.
arXiv Detail & Related papers (2024-07-02T14:02:53Z)
Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights [50.89022445197919]
We propose a speech-specific risk taxonomy, covering 8 risk categories under hostility (malicious sarcasm and threats), malicious imitation (age, gender, ethnicity), and stereotypical biases (age, gender, ethnicity) Based on the taxonomy, we create a small-scale dataset for evaluating current LMMs capability in detecting these categories of risk.
arXiv Detail & Related papers (2024-06-25T10:08:45Z)
MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance. Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes. We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z)
DPP-Based Adversarial Prompt Searching for Lanugage Models [56.73828162194457]
Auto-regressive Selective Replacement Ascent (ASRA) is a discrete optimization algorithm that selects prompts based on both quality and similarity with determinantal point process (DPP) Experimental results on six different pre-trained language models demonstrate the efficacy of ASRA for eliciting toxic content.
arXiv Detail & Related papers (2024-03-01T05:28:06Z)
YAYI 2: Multilingual Open-Source Large Language Models [53.92832054643197]
We propose YAYI 2, including both base and chat models, with 30 billion parameters. YAYI 2 is pre-trained from scratch on a multilingual corpus which contains 2.65 trillion tokens filtered by our pre-training data processing pipeline. The base model is aligned with human values through supervised fine-tuning with millions of instructions and reinforcement learning from human feedback.
arXiv Detail & Related papers (2023-12-22T17:34:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.