Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
- URL: http://arxiv.org/abs/2509.23041v2
- Date: Fri, 24 Oct 2025 07:58:07 GMT
- Title: Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
- Authors: Zi Liang, Qingqing Ye, Xuan Liu, Yanyun Wang, Jianliang Xu, Haibo Hu,
- Abstract summary: This paper systematically evaluates the resilience of synthetic-data-integrated training paradigm for large language models against poisoning and backdoor attacks.<n>We introduce a novel and universal attack framework, namely, Virus Infection Attack (VIA), which enables the propagation of current attacks through synthetic data.
- Score: 24.21219815496624
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Synthetic data refers to artificial samples generated by models. While it has been validated to significantly enhance the performance of large language models (LLMs) during training and has been widely adopted in LLM development, potential security risks it may introduce remain uninvestigated. This paper systematically evaluates the resilience of synthetic-data-integrated training paradigm for LLMs against mainstream poisoning and backdoor attacks. We reveal that such a paradigm exhibits strong resistance to existing attacks, primarily thanks to the different distribution patterns between poisoning data and queries used to generate synthetic samples. To enhance the effectiveness of these attacks and further investigate the security risks introduced by synthetic data, we introduce a novel and universal attack framework, namely, Virus Infection Attack (VIA), which enables the propagation of current attacks through synthetic data even under purely clean queries. Inspired by the principles of virus design in cybersecurity, VIA conceals the poisoning payload within a protective "shell" and strategically searches for optimal hijacking points in benign samples to maximize the likelihood of generating malicious content. Extensive experiments on both data poisoning and backdoor attacks show that VIA significantly increases the presence of poisoning content in synthetic data and correspondingly raises the attack success rate (ASR) on downstream models to levels comparable to those observed in the poisoned upstream models.
Related papers
- INFA-Guard: Mitigating Malicious Propagation via Infection-Aware Safeguarding in LLM-Based Multi-Agent Systems [70.37731999972785]
In this paper, we propose Infection-Aware Guard, INFA-Guard, a novel defense framework that explicitly identifies and addresses infected agents as a distinct threat category.<n>During remediation, INFA-Guard replaces attackers and rehabilitates infected ones, avoiding malicious propagation while preserving topological integrity.
arXiv Detail & Related papers (2026-01-21T05:27:08Z) - PHANTOM: Progressive High-fidelity Adversarial Network for Threat Object Modeling [0.0]
PHANTOM is a novel adversarial variational framework for generating high-fidelity synthetic attack data.<n>Its innovations include progressive training, a dual-path VAE-GAN architecture, and domain-specific feature matching to preserve the semantics of attacks.<n>Models trained on PHANTOM data achieve 98% weighted accuracy on real attacks.
arXiv Detail & Related papers (2025-12-12T18:14:19Z) - Associative Poisoning to Generative Machine Learning [5.094623170336122]
We introduce a novel data poisoning technique called associative poisoning.<n>It compromises fine-grained features of the generated data without requiring control of the training process.<n>This attack perturbs only the training data to manipulate statistical associations between specific feature pairs in the generated outputs.
arXiv Detail & Related papers (2025-11-07T11:47:33Z) - PoisonSwarm: Universal Harmful Information Synthesis via Model Crowdsourcing [12.108801150980598]
We propose PoisonSwarm, which applies the model crowdsourcing strategy to generate diverse harmful data.<n>We decompose each based template into multiple semantic units and perform unit-by-unit toxification.<n>Experiments demonstrate that PoisonSwarm achieves state-of-the-art performance in synthesizing different categories of harmful data.
arXiv Detail & Related papers (2025-05-27T13:33:57Z) - CopyrightShield: Enhancing Diffusion Model Security against Copyright Infringement Attacks [61.06621533874629]
Diffusion models are vulnerable to copyright infringement attacks, where attackers inject strategically modified non-infringing images into the training set.<n>We first propose a defense framework, CopyrightShield, to defend against the above attack.<n> Experimental results demonstrate that CopyrightShield significantly improves poisoned sample detection performance across two attack scenarios.
arXiv Detail & Related papers (2024-12-02T14:19:44Z) - Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM)<n>To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z) - Protecting Model Adaptation from Trojans in the Unlabeled Data [120.42853706967188]
This paper explores the potential trojan attacks on model adaptation launched by well-designed poisoning target data.<n>We propose a plug-and-play method named DiffAdapt, which can be seamlessly integrated with existing adaptation algorithms.
arXiv Detail & Related papers (2024-01-11T16:42:10Z) - Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions.
We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z) - Improving Adversarial Transferability by Stable Diffusion [36.97548018603747]
adversarial examples introduce imperceptible perturbations to benign samples, deceiving predictions.
Deep neural networks (DNNs) are susceptible to adversarial examples, which introduce imperceptible perturbations to benign samples, deceiving predictions.
We introduce a novel attack method called Stable Diffusion Attack Method (SDAM), which incorporates samples generated by Stable Diffusion to augment input images.
arXiv Detail & Related papers (2023-11-18T09:10:07Z) - Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information.
By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples.
We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z) - Exploring the Limits of Model-Targeted Indiscriminate Data Poisoning
Attacks [31.339252233416477]
We introduce the notion of model poisoning reachability as a technical tool to explore the intrinsic limits of data poisoning attacks towards target parameters.
We derive an easily computable threshold to establish and quantify a surprising phase transition phenomenon among popular ML models.
Our work highlights the critical role played by the poisoning ratio, and sheds new insights on existing empirical results, attacks and mitigation strategies in data poisoning.
arXiv Detail & Related papers (2023-03-07T01:55:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.