Eliciting Trustworthiness Priors of Large Language Models via Economic Games
- URL: http://arxiv.org/abs/2602.00769v1
- Date: Sat, 31 Jan 2026 15:23:03 GMT
- Title: Eliciting Trustworthiness Priors of Large Language Models via Economic Games
- Authors: Siyu Yan, Lusha Zhu, Jian-Qiao Zhu,
- Abstract summary: We propose a novel elicitation method based on iterated in-context learning.<n>We find that GPT-4.1's trustworthiness priors closely track those observed in humans.<n>We show that variation in elicited trustworthiness can be well predicted by a stereotype-based model.
- Score: 2.2940141855172036
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One critical aspect of building human-centered, trustworthy artificial intelligence (AI) systems is maintaining calibrated trust: appropriate reliance on AI systems outperforms both overtrust (e.g., automation bias) and undertrust (e.g., disuse). A fundamental challenge, however, is how to characterize the level of trust exhibited by an AI system itself. Here, we propose a novel elicitation method based on iterated in-context learning (Zhu and Griffiths, 2024a) and apply it to elicit trustworthiness priors using the Trust Game from behavioral game theory. The Trust Game is particularly well suited for this purpose because it operationalizes trust as voluntary exposure to risk based on beliefs about another agent, rather than self-reported attitudes. Using our method, we elicit trustworthiness priors from several leading large language models (LLMs) and find that GPT-4.1's trustworthiness priors closely track those observed in humans. Building on this result, we further examine how GPT-4.1 responds to different player personas in the Trust Game, providing an initial characterization of how such models differentiate trust across agent characteristics. Finally, we show that variation in elicited trustworthiness can be well predicted by a stereotype-based model grounded in perceived warmth and competence.
Related papers
- Revisiting Trust in the Era of Generative AI: Factorial Structure and Latent Profiles [5.109743403025609]
Trust is one of the most important factors shaping whether and how people adopt and rely on artificial intelligence (AI)<n>Most existing studies measure trust in terms of functionality, focusing on whether a system is reliable, accurate, or easy to use.<n>In this study, we introduce and validate the Human-AI Trust Scale (HAITS), a new measure designed to capture both the rational and relational aspects of trust in GenAI.
arXiv Detail & Related papers (2025-10-11T12:39:53Z) - Do LLMs trust AI regulation? Emerging behaviour of game-theoretic LLM agents [61.132523071109354]
This paper investigates the interplay between AI developers, regulators and users, modelling their strategic choices under different regulatory scenarios.<n>Our research identifies emerging behaviours of strategic AI agents, which tend to adopt more "pessimistic" stances than pure game-theoretic agents.
arXiv Detail & Related papers (2025-04-11T15:41:21Z) - On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective [377.2483044466149]
Generative Foundation Models (GenFMs) have emerged as transformative tools.<n>Their widespread adoption raises critical concerns regarding trustworthiness across dimensions.<n>This paper presents a comprehensive framework to address these challenges through three key contributions.
arXiv Detail & Related papers (2025-02-20T06:20:36Z) - A biologically inspired computational trust model for open multi-agent systems which is resilient to trustor population changes [0.0]
The study is based on CA, a proposed decentralized computational trust model inspired by synaptic plasticity and the formation of assemblies in the human brain.
We ran a series of simulations to compare CA model to FIRE, a well-established, decentralized trust and reputation model for open MAS.
The main finding is that FIRE is superior to changes in the trustee population, whereas CA is resilient to the trustor population changes.
arXiv Detail & Related papers (2024-04-13T10:56:32Z) - U-Trustworthy Models.Reliability, Competence, and Confidence in
Decision-Making [0.21756081703275998]
We present a precise mathematical definition of trustworthiness, termed $mathcalU$-trustworthiness.
Within the context of $mathcalU$-trustworthiness, we prove that properly-ranked models are inherently $mathcalU$-trustworthy.
We advocate for the adoption of the AUC metric as the preferred measure of trustworthiness.
arXiv Detail & Related papers (2024-01-04T04:58:02Z) - A Diachronic Perspective on User Trust in AI under Uncertainty [52.44939679369428]
Modern NLP systems are often uncalibrated, resulting in confidently incorrect predictions that undermine user trust.
We study the evolution of user trust in response to trust-eroding events using a betting game.
arXiv Detail & Related papers (2023-10-20T14:41:46Z) - DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT
Models [92.6951708781736]
This work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5.
We find that GPT models can be easily misled to generate toxic and biased outputs and leak private information.
Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps.
arXiv Detail & Related papers (2023-06-20T17:24:23Z) - Designing for Responsible Trust in AI Systems: A Communication
Perspective [56.80107647520364]
We draw from communication theories and literature on trust in technologies to develop a conceptual model called MATCH.
We highlight transparency and interaction as AI systems' affordances that present a wide range of trustworthiness cues to users.
We propose a checklist of requirements to help technology creators identify appropriate cues to use.
arXiv Detail & Related papers (2022-04-29T00:14:33Z) - Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and
Goals of Human Trust in AI [55.4046755826066]
We discuss a model of trust inspired by, but not identical to, sociology's interpersonal trust (i.e., trust between people)
We incorporate a formalization of 'contractual trust', such that trust between a user and an AI is trust that some implicit or explicit contract will hold.
We discuss how to design trustworthy AI, how to evaluate whether trust has manifested, and whether it is warranted.
arXiv Detail & Related papers (2020-10-15T03:07:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.