Building and Measuring Trust between Large Language Models
- URL: http://arxiv.org/abs/2508.15858v1
- Date: Wed, 20 Aug 2025 11:38:38 GMT
- Title: Building and Measuring Trust between Large Language Models
- Authors: Maarten Buyl, Yousra Fettach, Guillaume Bied, Tijl De Bie,
- Abstract summary: We study how different strategies to build trust compare, how trust can be measured implicitly, and how this relates to explicit measures of trust.<n>We build trust in three ways: by building rapport dynamically, by starting from a prewritten script that evidences trust, and by adapting the LLMs' system prompt.<n>Surprisingly, we find that the measures of explicit trust are either little or highly negatively correlated with implicit trust measures.
- Score: 10.539443038617089
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As large language models (LLMs) increasingly interact with each other, most notably in multi-agent setups, we may expect (and hope) that `trust' relationships develop between them, mirroring trust relationships between human colleagues, friends, or partners. Yet, though prior work has shown LLMs to be capable of identifying emotional connections and recognizing reciprocity in trust games, little remains known about (i) how different strategies to build trust compare, (ii) how such trust can be measured implicitly, and (iii) how this relates to explicit measures of trust. We study these questions by relating implicit measures of trust, i.e. susceptibility to persuasion and propensity to collaborate financially, with explicit measures of trust, i.e. a dyadic trust questionnaire well-established in psychology. We build trust in three ways: by building rapport dynamically, by starting from a prewritten script that evidences trust, and by adapting the LLMs' system prompt. Surprisingly, we find that the measures of explicit trust are either little or highly negatively correlated with implicit trust measures. These findings suggest that measuring trust between LLMs by asking their opinion may be deceiving. Instead, context-specific and implicit measures may be more informative in understanding how LLMs trust each other.
Related papers
- Influential Training Data Retrieval for Explaining Verbalized Confidence of LLMs [2.626100048563503]
Large language models (LLMs) can increase users' perceived trust by verbalizing confidence in their outputs.<n>We introduce TracVC, a method that builds on information retrieval and influence estimation to trace generated confidence expressions back to the training data.<n>Our analysis reveals that OLMo2-13B is frequently influenced by confidence-related data that is lexically unrelated to the query.
arXiv Detail & Related papers (2026-01-15T18:05:42Z) - Ties of Trust: a bowtie model to uncover trustor-trustee relationships in LLMs [1.1149261035759372]
We introduce a bowtie model for conceptualizing and formulating trust in Large Language Models (LLMs)<n>A core component comprehensively explores trust by tying its two sides, namely the trustor and the trustee, as well as their intricate relationships.<n>We uncover these relationships within the proposed bowtie model and beyond to its sociotechnical ecosystem.
arXiv Detail & Related papers (2025-06-11T11:42:52Z) - Attention Knows Whom to Trust: Attention-based Trust Management for LLM Multi-Agent Systems [52.57826440085856]
Large Language Model-based Multi-Agent Systems (LLM-MAS) have demonstrated strong capabilities in solving complex tasks but remain vulnerable when agents receive unreliable messages.<n>This vulnerability stems from a fundamental gap: LLM agents treat all incoming messages equally without evaluating their trustworthiness.<n>We propose Attention Trust Score (A-Trust), a lightweight, attention-based method for evaluating message trustworthiness.
arXiv Detail & Related papers (2025-06-03T07:32:57Z) - On the Need to Rethink Trust in AI Assistants for Software Development: A Critical Review [16.774993642353724]
Trust is a fundamental concept in human decision-making and collaboration.<n>Software engineering articles often use the term trust informally.<n> Related disciplines commonly embed their methodology and results in established trust models.
arXiv Detail & Related papers (2025-04-16T19:52:21Z) - Measuring and identifying factors of individuals' trust in Large Language Models [0.0]
Large Language Models (LLMs) can engage in human-looking conversational exchanges.<n>We introduce the Trust-In-LLMs Index (TILLMI) as a new framework to measure individuals' trust in LLMs.
arXiv Detail & Related papers (2025-02-28T13:16:34Z) - Fostering Trust and Quantifying Value of AI and ML [0.0]
Much has been discussed about trusting AI and ML inferences, but little has been done to define what that means.
producing ever more trustworthy machine learning inferences is a path to increase the value of products.
arXiv Detail & Related papers (2024-07-08T13:25:28Z) - When to Trust LLMs: Aligning Confidence with Response Quality [49.371218210305656]
We propose CONfidence-Quality-ORDer-preserving alignment approach (CONQORD)
It integrates quality reward and order-preserving alignment reward functions.
Experiments demonstrate that CONQORD significantly improves the alignment performance between confidence and response accuracy.
arXiv Detail & Related papers (2024-04-26T09:42:46Z) - TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness [58.721012475577716]
Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications.
This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLMs response aligns with its intrinsic knowledge.
arXiv Detail & Related papers (2024-02-19T21:12:14Z) - TrustLLM: Trustworthiness in Large Language Models [446.5640421311468]
This paper introduces TrustLLM, a comprehensive study of trustworthiness in large language models (LLMs)
We first propose a set of principles for trustworthy LLMs that span eight different dimensions.
Based on these principles, we establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics.
arXiv Detail & Related papers (2024-01-10T22:07:21Z) - Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and
Goals of Human Trust in AI [55.4046755826066]
We discuss a model of trust inspired by, but not identical to, sociology's interpersonal trust (i.e., trust between people)
We incorporate a formalization of 'contractual trust', such that trust between a user and an AI is trust that some implicit or explicit contract will hold.
We discuss how to design trustworthy AI, how to evaluate whether trust has manifested, and whether it is warranted.
arXiv Detail & Related papers (2020-10-15T03:07:23Z) - How Much Can We Really Trust You? Towards Simple, Interpretable Trust
Quantification Metrics for Deep Neural Networks [94.65749466106664]
We conduct a thought experiment and explore two key questions about trust in relation to confidence.
We introduce a suite of metrics for assessing the overall trustworthiness of deep neural networks based on their behaviour when answering a set of questions.
The proposed metrics are by no means perfect, but the hope is to push the conversation towards better metrics.
arXiv Detail & Related papers (2020-09-12T17:37:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.