Related papers: Attention Knows Whom to Trust: Attention-based Trust Management for LLM Multi-Agent Systems

Attention Knows Whom to Trust: Attention-based Trust Management for LLM Multi-Agent Systems

URL: http://arxiv.org/abs/2506.02546v1
Date: Tue, 03 Jun 2025 07:32:57 GMT
Title: Attention Knows Whom to Trust: Attention-based Trust Management for LLM Multi-Agent Systems
Authors: Pengfei He, Zhenwei Dai, Xianfeng Tang, Yue Xing, Hui Liu, Jingying Zeng, Qiankun Peng, Shrivats Agrawal, Samarth Varshney, Suhang Wang, Jiliang Tang, Qi He,
Abstract summary: Large Language Model-based Multi-Agent Systems (LLM-MAS) have demonstrated strong capabilities in solving complex tasks but remain vulnerable when agents receive unreliable messages.<n>This vulnerability stems from a fundamental gap: LLM agents treat all incoming messages equally without evaluating their trustworthiness.<n>We propose Attention Trust Score (A-Trust), a lightweight, attention-based method for evaluating message trustworthiness.
Score: 52.57826440085856
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Model-based Multi-Agent Systems (LLM-MAS) have demonstrated strong capabilities in solving complex tasks but remain vulnerable when agents receive unreliable messages. This vulnerability stems from a fundamental gap: LLM agents treat all incoming messages equally without evaluating their trustworthiness. While some existing studies approach the trustworthiness, they focus on a single type of harmfulness rather than analyze it in a holistic approach from multiple trustworthiness perspectives. In this work, we propose Attention Trust Score (A-Trust), a lightweight, attention-based method for evaluating message trustworthiness. Inspired by human communication literature[1], through systematically analyzing attention behaviors across six orthogonal trust dimensions, we find that certain attention heads in the LLM specialize in detecting specific types of violations. Leveraging these insights, A-Trust directly infers trustworthiness from internal attention patterns without requiring external prompts or verifiers. Building upon A-Trust, we develop a principled and efficient trust management system (TMS) for LLM-MAS, enabling both message-level and agent-level trust assessment. Experiments across diverse multi-agent settings and tasks demonstrate that applying our TMS significantly enhances robustness against malicious inputs.

Related papers

Ties of Trust: a bowtie model to uncover trustor-trustee relationships in LLMs [1.1149261035759372]
We introduce a bowtie model for conceptualizing and formulating trust in Large Language Models (LLMs)<n>A core component comprehensively explores trust by tying its two sides, namely the trustor and the trustee, as well as their intricate relationships.<n>We uncover these relationships within the proposed bowtie model and beyond to its sociotechnical ecosystem.
arXiv Detail & Related papers (2025-06-11T11:42:52Z)
Mapping the Trust Terrain: LLMs in Software Engineering -- Insights and Perspectives [25.27634711529676]
Applications of Large Language Models (LLMs) are rapidly growing in industry and academia for various software engineering (SE) tasks.<n>As these models become more integral to critical processes, ensuring their reliability and trustworthiness becomes essential.<n>The landscape of trust-related concepts in LLMs in SE is relatively unclear, with concepts such as trust, distrust, and trustworthiness lacking clear conceptualizations.
arXiv Detail & Related papers (2025-03-18T00:49:43Z)
AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving [106.0319745724181]
We introduce AutoTrust, a comprehensive trustworthiness benchmark for large vision-language models in autonomous driving (DriveVLMs)<n>We constructed the largest visual question-answering dataset for investigating trustworthiness issues in driving scenarios.<n>Our evaluations have unveiled previously undiscovered vulnerabilities of DriveVLMs to trustworthiness threats.
arXiv Detail & Related papers (2024-12-19T18:59:33Z)
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models [51.19622266249408]
MultiTrust is the first comprehensive and unified benchmark on the trustworthiness of MLLMs.<n>Our benchmark employs a rigorous evaluation strategy that addresses both multimodal risks and cross-modal impacts.<n>Extensive experiments with 21 modern MLLMs reveal some previously unexplored trustworthiness issues and risks.
arXiv Detail & Related papers (2024-06-11T08:38:13Z)
When to Trust LLMs: Aligning Confidence with Response Quality [49.371218210305656]
We propose CONfidence-Quality-ORDer-preserving alignment approach (CONQORD) It integrates quality reward and order-preserving alignment reward functions. Experiments demonstrate that CONQORD significantly improves the alignment performance between confidence and response accuracy.
arXiv Detail & Related papers (2024-04-26T09:42:46Z)
Bayesian Methods for Trust in Collaborative Multi-Agent Autonomy [11.246557832016238]
In safety-critical and contested environments, adversaries may infiltrate and compromise a number of agents. We analyze state of the art multi-target tracking algorithms under this compromised agent threat model. We design a trust estimation framework using hierarchical Bayesian updating.
arXiv Detail & Related papers (2024-03-25T17:17:35Z)
TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness [58.721012475577716]
Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications. This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLMs response aligns with its intrinsic knowledge.
arXiv Detail & Related papers (2024-02-19T21:12:14Z)
TrustLLM: Trustworthiness in Large Language Models [446.5640421311468]
This paper introduces TrustLLM, a comprehensive study of trustworthiness in large language models (LLMs) We first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics.
arXiv Detail & Related papers (2024-01-10T22:07:21Z)
How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities [37.14654106278984]
We conduct an adversarial assessment of open-source Large Language Models (LLMs) on trustworthiness. We propose advCoU, an extended Chain of Utterances-based (CoU) prompting strategy by incorporating carefully crafted malicious demonstrations for trustworthiness attack. Our experiments encompass recent and representative series of open-source LLMs, including Vicuna, MPT, Falcon, Mistral, and Llama 2.
arXiv Detail & Related papers (2023-11-15T23:33:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.