DECEIVE-AFC: Adversarial Claim Attacks against Search-Enabled LLM-based Fact-Checking Systems
- URL: http://arxiv.org/abs/2602.02569v1
- Date: Sat, 31 Jan 2026 03:49:23 GMT
- Title: DECEIVE-AFC: Adversarial Claim Attacks against Search-Enabled LLM-based Fact-Checking Systems
- Authors: Haoran Ou, Kangjie Chen, Gelei Deng, Hangcheng Liu, Jie Zhang, Tianwei Zhang, Kwok-Yan Lam,
- Abstract summary: We study adversarial claim attacks against search-enabled fact-checking systems under a realistic input-only threat model.<n>We propose DECEIVE-AFC, an agent-based adversarial attack framework that integrates novel claim-level attack strategies and adversarial claim validity evaluation principles.<n>Our attacks substantially degrade verification performance, reducing accuracy from 78.7% to 53.7%, and significantly outperform existing claim-based attack baselines with strong cross-system transferability.
- Score: 38.6944646666426
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fact-checking systems with search-enabled large language models (LLMs) have shown strong potential for verifying claims by dynamically retrieving external evidence. However, the robustness of such systems against adversarial attack remains insufficiently understood. In this work, we study adversarial claim attacks against search-enabled LLM-based fact-checking systems under a realistic input-only threat model. We propose DECEIVE-AFC, an agent-based adversarial attack framework that integrates novel claim-level attack strategies and adversarial claim validity evaluation principles. DECEIVE-AFC systematically explores adversarial attack trajectories that disrupt search behavior, evidence retrieval, and LLM-based reasoning without relying on access to evidence sources or model internals. Extensive evaluations on benchmark datasets and real-world systems demonstrate that our attacks substantially degrade verification performance, reducing accuracy from 78.7% to 53.7%, and significantly outperform existing claim-based attack baselines with strong cross-system transferability.
Related papers
- Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation [76.5533899503582]
Large language models (LLMs) are increasingly used as judges to evaluate agent performance.<n>We show this paradigm implicitly assumes that the agent's chain-of-thought (CoT) reasoning faithfully reflects both its internal reasoning and the underlying environment state.<n>We demonstrate that manipulated reasoning alone can inflate false positive rates of state-of-the-art VLM judges by up to 90% across 800 trajectories spanning diverse web tasks.
arXiv Detail & Related papers (2026-01-21T06:07:43Z) - AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models [60.39655329875822]
Vision-Language-Action (VLA) models enable robots to interpret natural-language instructions and perform diverse tasks.<n>Despite growing interest in attacking such models, the effectiveness of existing techniques remains unclear.<n>We propose AttackVLA, a unified framework that aligns with the VLA development lifecycle.
arXiv Detail & Related papers (2025-11-15T10:30:46Z) - RAG-targeted Adversarial Attack on LLM-based Threat Detection and Mitigation Framework [0.19116784879310025]
The rapid expansion of the Internet of Things (IoT) is reshaping communication and operational practices across industries, but it also broadens the attack surface and increases susceptibility to security breaches.<n>Artificial Intelligence has become a valuable solution in securing IoT networks, with Large Language Models (LLMs) enabling automated attack behavior analysis and mitigation suggestion.<n>We attack an LLM-based IoT attack analysis and mitigation framework to test its adversarial robustness.
arXiv Detail & Related papers (2025-11-09T03:50:17Z) - Adversarial Attacks Against Automated Fact-Checking: A Survey [36.08022268176274]
This survey provides the first in-depth review of adversarial attacks targeting fact-checking systems.<n>We examine recent advancements in adversary-aware defenses and highlight open research questions.<n>Our findings underscore the urgent need for resilient FC frameworks capable of withstanding adversarial manipulations.
arXiv Detail & Related papers (2025-09-10T10:10:10Z) - Counterfactual Evaluation for Blind Attack Detection in LLM-based Evaluation Systems [24.312079827029326]
We formalize a class of threats called blind attacks, where a candidate answer is crafted independently of the true answer to deceive the evaluator.<n>To counter such attacks, we propose a framework that augments Standard Evaluation (SE) with Counterfactual Evaluation (CFE)<n>An attack is detected if the system validates an answer under both standard and counterfactual conditions.
arXiv Detail & Related papers (2025-07-31T11:29:42Z) - The Silent Saboteur: Imperceptible Adversarial Attacks against Black-Box Retrieval-Augmented Generation Systems [101.68501850486179]
We explore adversarial attacks against retrieval-augmented generation (RAG) systems to identify their vulnerabilities.<n>This task aims to find imperceptible perturbations that retrieve a target document, originally excluded from the initial top-$k$ candidate set.<n>We propose ReGENT, a reinforcement learning-based framework that tracks interactions between the attacker and the target RAG.
arXiv Detail & Related papers (2025-05-24T08:19:25Z) - Chain-of-Thought Poisoning Attacks against R1-based Retrieval-Augmented Generation Systems [39.05753852489526]
Existing adversarial attack methods typically exploit knowledge base poisoning to probe the vulnerabilities of RAG systems.<n>This paper uses reasoning process templates from R1-based RAG systems to wrap erroneous knowledge into adversarial documents, and injects them into the knowledge base to attack RAG systems.<n>The key idea of our approach is that adversarial documents, by simulating the chain-of-thought patterns aligned with the model's training signals, may be misinterpreted by the model as authentic historical reasoning processes.
arXiv Detail & Related papers (2025-05-22T08:22:46Z) - Towards More Robust Retrieval-Augmented Generation: Evaluating RAG Under Adversarial Poisoning Attacks [45.07581174558107]
Retrieval-Augmented Generation (RAG) systems have emerged as a promising solution to mitigate hallucinations.<n>RAG systems are vulnerable to adversarial poisoning attacks, where malicious passages injected into the retrieval corpus can mislead models into producing factually incorrect outputs.<n>We present a rigorously controlled empirical study of how RAG systems behave under such attacks and how their robustness can be improved.
arXiv Detail & Related papers (2024-12-21T17:31:52Z) - Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models [79.0183835295533]
We introduce the first benchmark for indirect prompt injection attacks, named BIPIA, to assess the risk of such vulnerabilities.<n>Our analysis identifies two key factors contributing to their success: LLMs' inability to distinguish between informational context and actionable instructions, and their lack of awareness in avoiding the execution of instructions within external content.<n>We propose two novel defense mechanisms-boundary awareness and explicit reminder-to address these vulnerabilities in both black-box and white-box settings.
arXiv Detail & Related papers (2023-12-21T01:08:39Z) - MF-CLIP: Leveraging CLIP as Surrogate Models for No-box Adversarial Attacks [65.86360607693457]
No-box attacks, where adversaries have no prior knowledge, remain relatively underexplored despite its practical relevance.<n>This work presents a systematic investigation into leveraging large-scale Vision-Language Models (VLMs) as surrogate models for executing no-box attacks.<n>Our theoretical and empirical analyses reveal a key limitation in the execution of no-box attacks stemming from insufficient discriminative capabilities for direct application of vanilla CLIP as a surrogate model.<n>We propose MF-CLIP: a novel framework that enhances CLIP's effectiveness as a surrogate model through margin-aware feature space optimization.
arXiv Detail & Related papers (2023-07-13T08:10:48Z) - Avoid Adversarial Adaption in Federated Learning by Multi-Metric
Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources.
FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks.
We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously.
MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.