Related papers: Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage

Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage

URL: http://arxiv.org/abs/2601.01685v1
Date: Sun, 04 Jan 2026 22:50:23 GMT
Title: Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage
Authors: Jinwei Hu, Xinmiao Huang, Youcheng Sun, Yi Dong, Xiaowei Huang,
Abstract summary: As large language models (LLMs) transition to autonomous agents synthesizing real-time information, their reasoning capabilities introduce an unexpected attack surface.<n>This paper introduces a novel threat where colluding agents steer victim beliefs using only truthful evidence fragments distributed through public channels.
Score: 18.964773489734547
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As large language models (LLMs) transition to autonomous agents synthesizing real-time information, their reasoning capabilities introduce an unexpected attack surface. This paper introduces a novel threat where colluding agents steer victim beliefs using only truthful evidence fragments distributed through public channels, without relying on covert communications, backdoors, or falsified documents. By exploiting LLMs' overthinking tendency, we formalize the first cognitive collusion attack and propose Generative Montage: a Writer-Editor-Director framework that constructs deceptive narratives through adversarial debate and coordinated posting of evidence fragments, causing victims to internalize and propagate fabricated conclusions. To study this risk, we develop CoPHEME, a dataset derived from real-world rumor events, and simulate attacks across diverse LLM families. Our results show pervasive vulnerability across 14 LLM families: attack success rates reach 74.4% for proprietary models and 70.6% for open-weights models. Counterintuitively, stronger reasoning capabilities increase susceptibility, with reasoning-specialized models showing higher attack success than base models or prompts. Furthermore, these false beliefs then cascade to downstream judges, achieving over 60% deception rates, highlighting a socio-technical vulnerability in how LLM-based agents interact with dynamic information environments. Our implementation and data are available at: https://github.com/CharlesJW222/Lying_with_Truth/tree/main.

Related papers

The Facade of Truth: Uncovering and Mitigating LLM Susceptibility to Deceptive Evidence [49.94160400740222]
We introduce MisBelief, a framework that generates misleading evidence via collaborative, multi-round interactions.<n>Using MisBelief, we generate 4,800 instances across three difficulty levels to evaluate 7 representative LLMs.<n>Results indicate that while models are robust to direct misinformation, they are highly sensitive to this refined evidence.<n>We propose Deceptive Intent Shielding (DIS), a governance mechanism that provides an early warning signal by inferring the deceptive intent behind evidence.
arXiv Detail & Related papers (2026-01-09T02:28:00Z)
Friend or Foe: How LLMs' Safety Mind Gets Fooled by Intent Shift Attack [53.34204977366491]
Large language models (LLMs) remain vulnerable to jailbreaking attacks despite their impressive capabilities.<n>In this paper, we introduce ISA (Intent Shift Attack), which obfuscates LLMs about the intent of the attacks.<n>Our approach only needs minimal edits to the original request, and yields natural, human-readable, and seemingly harmless prompts.
arXiv Detail & Related papers (2025-11-01T13:44:42Z)
BreakFun: Jailbreaking LLMs via Schema Exploitation [0.28647133890966986]
We investigate how Large Language Models (LLMs) can be turned into critical weaknesses.<n>This vulnerability is highly transferable, achieving an average success rate of 89% across 13 models.<n>A secondary LLM performs a "Literal Transcription" to isolate and reveal the user's true harmful intent.
arXiv Detail & Related papers (2025-10-19T11:27:44Z)
Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL [64.3268313484078]
Large Language Models (LLMs) interact with millions of people worldwide in applications such as customer support, education and healthcare.<n>Their ability to produce deceptive outputs, whether intentionally or inadvertently, poses significant safety concerns.<n>We investigate the extent to which LLMs engage in deception within dialogue, and propose the belief misalignment metric to quantify deception.
arXiv Detail & Related papers (2025-10-16T05:29:36Z)
DecepChain: Inducing Deceptive Reasoning in Large Language Models [28.80439047115244]
Large Language Models (LLMs) have been demonstrating increasingly strong reasoning capability with their chain-of-thoughts (CoT)<n>We present an urgent but underexplored risk: attackers could induce LLMs to generate incorrect yet coherent CoTs that look plausible at first glance.<n>We introduce DecepChain, a novel backdoor attack paradigm that steers models to generate reasoning that appears benign while yielding incorrect conclusions eventually.
arXiv Detail & Related papers (2025-09-30T22:23:40Z)
Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers [61.57691030102618]
We propose a novel jailbreaking method, Paper Summary Attack (llmnamePSA)<n>It synthesizes content from either attack-focused or defense-focused LLM safety paper to construct an adversarial prompt template.<n>Experiments show significant vulnerabilities not only in base LLMs, but also in state-of-the-art reasoning model like Deepseek-R1.
arXiv Detail & Related papers (2025-07-17T18:33:50Z)
Compromising Honesty and Harmlessness in Language Models via Deception Attacks [0.04499833362998487]
Large language models (LLMs) can understand and employ deceptive behavior, even without explicit prompting.<n>We introduce "deception attacks" that undermine these traits, revealing a vulnerability that, if exploited, could have serious real-world consequences.<n>We show that such targeted deception is effective even in high-stakes domains or ideologically charged subjects.
arXiv Detail & Related papers (2025-02-12T11:02:59Z)
Targeting the Core: A Simple and Effective Method to Attack RAG-based Agents via Direct LLM Manipulation [4.241100280846233]
AI agents, powered by large language models (LLMs), have transformed human-computer interactions by enabling seamless, natural, and context-aware communication.<n>This paper investigates a critical vulnerability: adversarial attacks targeting the LLM core within AI agents.
arXiv Detail & Related papers (2024-12-05T18:38:30Z)
Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks [55.603893267803265]
Large Language Models (LLMs) are susceptible to Jailbreaking attacks. Jailbreaking attacks aim to extract harmful information by subtly modifying the attack query. We focus on a new attack form, called Contextual Interaction Attack.
arXiv Detail & Related papers (2024-02-14T13:45:19Z)
Avoid Adversarial Adaption in Federated Learning by Multi-Metric Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources. FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks. We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously. MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.