Do You Trust Me? Cognitive-Affective Signatures of Trustworthiness in Large Language Models
- URL: http://arxiv.org/abs/2601.10719v1
- Date: Wed, 17 Dec 2025 08:47:23 GMT
- Title: Do You Trust Me? Cognitive-Affective Signatures of Trustworthiness in Large Language Models
- Authors: Gerard Yeo, Svetlana Churina, Kokil Jaidka,
- Abstract summary: We analyze how large language models encode perceived trustworthiness in web-like narratives.<n>Across models, systematic layer- and head-level activation differences distinguish high- from low-trust texts.<n>Strongest associations emerge with appraisals of fairness, certainty, and accountability-self.
- Score: 12.714909005419964
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Perceived trustworthiness underpins how users navigate online information, yet it remains unclear whether large language models (LLMs),increasingly embedded in search, recommendation, and conversational systems, represent this construct in psychologically coherent ways. We analyze how instruction-tuned LLMs (Llama 3.1 8B, Qwen 2.5 7B, Mistral 7B) encode perceived trustworthiness in web-like narratives using the PEACE-Reviews dataset annotated for cognitive appraisals, emotions, and behavioral intentions. Across models, systematic layer- and head-level activation differences distinguish high- from low-trust texts, revealing that trust cues are implicitly encoded during pretraining. Probing analyses show linearly de-codable trust signals and fine-tuning effects that refine rather than restructure these representations. Strongest associations emerge with appraisals of fairness, certainty, and accountability-self -- dimensions central to human trust formation online. These findings demonstrate that modern LLMs internalize psychologically grounded trust signals without explicit supervision, offering a representational foundation for designing credible, transparent, and trust-worthy AI systems in the web ecosystem. Code and appendix are available at: https://github.com/GerardYeo/TrustworthinessLLM.
Related papers
- Eliciting Trustworthiness Priors of Large Language Models via Economic Games [2.2940141855172036]
We propose a novel elicitation method based on iterated in-context learning.<n>We find that GPT-4.1's trustworthiness priors closely track those observed in humans.<n>We show that variation in elicited trustworthiness can be well predicted by a stereotype-based model.
arXiv Detail & Related papers (2026-01-31T15:23:03Z) - Epistemic Context Learning: Building Trust the Right Way in LLM-Based Multi-Agent Systems [94.9141394384021]
Individual agents in multi-agent systems often lack robustness, tending to blindly conform to misleading peers.<n>We show this weakness stems from both sycophancy and inadequate ability to evaluate peer reliability.<n>We first formalize the learning problem of history-aware reference, introducing the historical interactions of peers as additional input.<n>We then develop Epistemic Context Learning (ECL), a reasoning framework that conditions predictions on explicitly-built peer profiles from history.
arXiv Detail & Related papers (2026-01-29T13:59:32Z) - COMPASS: Context-Modulated PID Attention Steering System for Hallucination Mitigation [2.1521364454860525]
We introduce a lightweight, interpretable control framework that embeds a model-based feedback loop directly within decoding.<n>We show that a PID controller dynamically modulates attention heads to maintain factual consistency without retraining or multi-pass decoding.
arXiv Detail & Related papers (2025-11-05T05:30:28Z) - Ties of Trust: a bowtie model to uncover trustor-trustee relationships in LLMs [1.1149261035759372]
We introduce a bowtie model for conceptualizing and formulating trust in Large Language Models (LLMs)<n>A core component comprehensively explores trust by tying its two sides, namely the trustor and the trustee, as well as their intricate relationships.<n>We uncover these relationships within the proposed bowtie model and beyond to its sociotechnical ecosystem.
arXiv Detail & Related papers (2025-06-11T11:42:52Z) - Attention Knows Whom to Trust: Attention-based Trust Management for LLM Multi-Agent Systems [52.57826440085856]
Large Language Model-based Multi-Agent Systems (LLM-MAS) have demonstrated strong capabilities in solving complex tasks but remain vulnerable when agents receive unreliable messages.<n>This vulnerability stems from a fundamental gap: LLM agents treat all incoming messages equally without evaluating their trustworthiness.<n>We propose Attention Trust Score (A-Trust), a lightweight, attention-based method for evaluating message trustworthiness.
arXiv Detail & Related papers (2025-06-03T07:32:57Z) - Do LLMs trust AI regulation? Emerging behaviour of game-theoretic LLM agents [61.132523071109354]
This paper investigates the interplay between AI developers, regulators and users, modelling their strategic choices under different regulatory scenarios.<n>Our research identifies emerging behaviours of strategic AI agents, which tend to adopt more "pessimistic" stances than pure game-theoretic agents.
arXiv Detail & Related papers (2025-04-11T15:41:21Z) - ParamMute: Suppressing Knowledge-Critical FFNs for Faithful Retrieval-Augmented Generation [91.20492150248106]
We investigate the internal mechanisms behind unfaithful generation and identify a subset of mid-to-deep feed-forward networks (FFNs) that are disproportionately activated in such cases.<n>We propose Parametric Knowledge Muting through FFN Suppression (ParamMute), a framework that improves contextual faithfulness by suppressing the activation of unfaithfulness-associated FFNs.<n> Experimental results show that ParamMute significantly enhances faithfulness across both CoFaithfulQA and the established ConFiQA benchmark, achieving substantial reductions in reliance on parametric memory.
arXiv Detail & Related papers (2025-02-21T15:50:41Z) - Fostering Trust and Quantifying Value of AI and ML [0.0]
Much has been discussed about trusting AI and ML inferences, but little has been done to define what that means.
producing ever more trustworthy machine learning inferences is a path to increase the value of products.
arXiv Detail & Related papers (2024-07-08T13:25:28Z) - TrustGuard: GNN-based Robust and Explainable Trust Evaluation with
Dynamicity Support [59.41529066449414]
We propose TrustGuard, a GNN-based accurate trust evaluation model that supports trust dynamicity.
TrustGuard is designed with a layered architecture that contains a snapshot input layer, a spatial aggregation layer, a temporal aggregation layer, and a prediction layer.
Experiments show that TrustGuard outperforms state-of-the-art GNN-based trust evaluation models with respect to trust prediction across single-timeslot and multi-timeslot.
arXiv Detail & Related papers (2023-06-23T07:39:12Z) - KGTrust: Evaluating Trustworthiness of SIoT via Knowledge Enhanced Graph
Neural Networks [63.531790269009704]
Social Internet of Things (SIoT) is a promising and emerging paradigm that injects the notion of social networking into smart objects (i.e., things)
Due to the risks and uncertainty, a crucial and urgent problem to be settled is establishing reliable relationships within SIoT, that is, trust evaluation.
We propose a novel knowledge-enhanced graph neural network (KGTrust) for better trust evaluation in SIoT.
arXiv Detail & Related papers (2023-02-22T14:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.