Related papers: Graph Representation-based Model Poisoning on Federated Large Language Models

Graph Representation-based Model Poisoning on Federated Large Language Models

URL: http://arxiv.org/abs/2507.01694v2
Date: Thu, 31 Jul 2025 12:30:18 GMT
Title: Graph Representation-based Model Poisoning on Federated Large Language Models
Authors: Hanlin Cai, Haofan Dong, Houtianfu Wang, Kai Li, Ozgur B. Akan,
Abstract summary: Federated large language models (FedLLMs) enable powerful generative capabilities within wireless networks while preserving data privacy.<n>This article first reviews recent advancements in model poisoning techniques and existing defense mechanisms for FedLLMs, underscoring critical limitations.<n>The article further investigates graph representation-based model poisoning (GRMP), an emerging attack paradigm that exploits higher-order correlations among benign client gradients to craft malicious updates indistinguishable from legitimate ones.
Score: 3.5233863453805143
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Federated large language models (FedLLMs) enable powerful generative capabilities within wireless networks while preserving data privacy. Nonetheless, FedLLMs remain vulnerable to model poisoning attacks. This article first reviews recent advancements in model poisoning techniques and existing defense mechanisms for FedLLMs, underscoring critical limitations, especially when dealing with non-IID textual data distributions. Current defense strategies predominantly employ distance or similarity-based outlier detection mechanisms, relying on the assumption that malicious updates markedly differ from benign statistical patterns. However, this assumption becomes inadequate against adaptive adversaries targeting billion-parameter LLMs. The article further investigates graph representation-based model poisoning (GRMP), an emerging attack paradigm that exploits higher-order correlations among benign client gradients to craft malicious updates indistinguishable from legitimate ones. GRMP can effectively circumvent advanced defense systems, causing substantial degradation in model accuracy and overall performance. Moreover, the article outlines a forward-looking research roadmap that emphasizes the necessity of graph-aware secure aggregation methods, specialized vulnerability metrics tailored for FedLLMs, and evaluation frameworks to enhance the robustness of federated language model deployments.

Related papers

Detoxifying LLMs via Representation Erasure-Based Preference Optimization [44.29978832356216]
Large language models (LLMs) trained on webscale data can produce toxic outputs.<n>Prior defenses, based on applications of DPO, NPO, and similar algorithms, reduce the likelihood of harmful continuations.<n>We propose Representation Erasure-based Preference Optimization (REPO), reformulating detoxification as a token-level preference problem.
arXiv Detail & Related papers (2026-02-24T22:51:06Z)
Graph Representation-based Model Poisoning on the Heterogeneous Internet of Agents [7.66383868842486]
Internet of Agents (IoA) envisions a unified, agent-centric paradigm where heterogeneous large language model (LLM) agents can interconnect and collaborate at scale.<n>Within this paradigm, federated learning (FL) serves as a key enabler that allows distributed LLM agents to co-train global models without centralizing data.<n>This paper proposes a graph representation-based model poisoning (GRMP) attack, which passively exploits observed benign local models to construct a parameter correlation graph and extends an adversarial variational graph autoencoder to capture and reshape higher-order dependencies.
arXiv Detail & Related papers (2025-11-10T15:06:26Z)
Explainer-guided Targeted Adversarial Attacks against Binary Code Similarity Detection Models [12.524811181751577]
We propose a novel optimization for adversarial attacks against BCSD models.<n>In particular, we aim to improve the attacks in a challenging scenario, where the attack goal is to limit the model predictions to a specific range.<n>Our attack leverages the superior capability of black-box, model-agnostic explainers in interpreting the model decision boundaries.
arXiv Detail & Related papers (2025-06-05T08:29:19Z)
MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z)
Constrained Network Adversarial Attacks: Validity, Robustness, and Transferability [0.0]
Research reveals a critical flaw in existing adversarial attack methodologies.<n>We show that the frequent violation of domain-specific constraints, inherent to IoT and network traffic, leads to up to 80.3% of adversarial examples being invalid.<n>This work underscores the importance of considering both domain constraints and model architecture when evaluating and designing robust ML/DL models for security-critical IoT and network applications.
arXiv Detail & Related papers (2025-05-02T15:01:42Z)
Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge [0.0]
Large Language Models (LLMs) have revolutionized artificial intelligence, driving advancements in machine translation, summarization, and conversational agents.<n>Recent studies indicate that LLMs remain vulnerable to adversarial attacks designed to elicit biased responses.<n>This work proposes a scalable benchmarking framework to evaluate LLM robustness against adversarial bias elicitation.
arXiv Detail & Related papers (2025-04-10T16:00:59Z)
ATOM: A Framework of Detecting Query-Based Model Extraction Attacks for Graph Neural Networks [18.488168353080464]
Graph Neural Networks (GNNs) have gained traction in Graph-based Machine Learning as a Service (GML) platforms, yet they remain vulnerable to graph-based model extraction attacks (MEAs)<n>We propose ATOM, a novel real-time MEA detection framework tailored for GNNs.<n>ATOM integrates sequential modeling and reinforcement learning to dynamically detect evolving attack patterns, while leveraging $k$core embedding to capture the structural properties, enhancing detection precision.
arXiv Detail & Related papers (2025-03-20T20:25:32Z)
Adversarial Training for Defense Against Label Poisoning Attacks [53.893792844055106]
Label poisoning attacks pose significant risks to machine learning models.<n>We propose a novel adversarial training defense strategy based on support vector machines (SVMs) to counter these threats.<n>Our approach accommodates various model architectures and employs a projected gradient descent algorithm with kernel SVMs for adversarial training.
arXiv Detail & Related papers (2025-02-24T13:03:19Z)
Do Spikes Protect Privacy? Investigating Black-Box Model Inversion Attacks in Spiking Neural Networks [0.0]
This work presents the first study of black-box Model Inversion (MI) attacks on Spiking Neural Networks (SNNs)<n>We adapt a generative adversarial MI framework to the spiking domain by incorporating rate-based encoding for input transformation and decoding mechanisms for output interpretation.<n>Our results show that SNNs exhibit significantly greater resistance to MI attacks than ANNs, as demonstrated by degraded reconstructions, increased instability in attack convergence, and overall reduced attack effectiveness across multiple evaluation metrics.
arXiv Detail & Related papers (2025-02-08T10:02:27Z)
HarmLevelBench: Evaluating Harm-Level Compliance and the Impact of Quantization on Model Alignment [1.8843687952462742]
This paper aims to address gaps in the current literature on jailbreaking techniques and the evaluation of LLM vulnerabilities. Our contributions include the creation of a novel dataset designed to assess the harmfulness of model outputs across multiple harm levels. We provide a comprehensive benchmark of state-of-the-art jailbreaking attacks, specifically targeting the Vicuna 13B v1.5 model.
arXiv Detail & Related papers (2024-11-11T10:02:49Z)
Mitigating Malicious Attacks in Federated Learning via Confidence-aware Defense [3.685395311534351]
Federated Learning (FL) is a distributed machine learning diagram that enables multiple clients to collaboratively train a global model without sharing their private local data. FL systems are vulnerable to attacks that are happening in malicious clients through data poisoning and model poisoning. Existing defense methods typically focus on mitigating specific types of poisoning and are often ineffective against unseen types of attack.
arXiv Detail & Related papers (2024-08-05T20:27:45Z)
A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks [43.98557963966335]
Model Inversion (MI) attacks aim to reconstruct privacy-sensitive training data from released models by utilizing output information. Recent advances in generative adversarial networks (GANs) have contributed significantly to the improved performance of MI attacks. We propose a novel method, Intermediate Features enhanced Generative Model Inversion (IF-GMI), which disassembles the GAN structure and exploits features between intermediate blocks.
arXiv Detail & Related papers (2024-07-18T19:16:22Z)
MirrorCheck: Efficient Adversarial Defense for Vision-Language Models [55.73581212134293]
We propose a novel, yet elegantly simple approach for detecting adversarial samples in Vision-Language Models. Our method leverages Text-to-Image (T2I) models to generate images based on captions produced by target VLMs. Empirical evaluations conducted on different datasets validate the efficacy of our approach.
arXiv Detail & Related papers (2024-06-13T15:55:04Z)
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content [62.685566387625975]
Current mitigation strategies, while effective, are not resilient under adversarial attacks. This paper introduces Resilient Guardrails for Large Language Models (RigorLLM), a novel framework designed to efficiently moderate harmful and unsafe inputs.
arXiv Detail & Related papers (2024-03-19T07:25:02Z)
FreqFed: A Frequency Analysis-Based Approach for Mitigating Poisoning Attacks in Federated Learning [98.43475653490219]
Federated learning (FL) is susceptible to poisoning attacks. FreqFed is a novel aggregation mechanism that transforms the model updates into the frequency domain. We demonstrate that FreqFed can mitigate poisoning attacks effectively with a negligible impact on the utility of the aggregated model.
arXiv Detail & Related papers (2023-12-07T16:56:24Z)
Data-Agnostic Model Poisoning against Federated Learning: A Graph Autoencoder Approach [65.2993866461477]
This paper proposes a data-agnostic, model poisoning attack on Federated Learning (FL) The attack requires no knowledge of FL training data and achieves both effectiveness and undetectability. Experiments show that the FL accuracy drops gradually under the proposed attack and existing defense mechanisms fail to detect it.
arXiv Detail & Related papers (2023-11-30T12:19:10Z)
Learn from the Past: A Proxy Guided Adversarial Defense Framework with Self Distillation Regularization [53.04697800214848]
Adversarial Training (AT) is pivotal in fortifying the robustness of deep learning models. AT methods, relying on direct iterative updates for target model's defense, frequently encounter obstacles such as unstable training and catastrophic overfitting. We present a general proxy guided defense framework, LAST' (bf Learn from the Pbf ast)
arXiv Detail & Related papers (2023-10-19T13:13:41Z)
Avoid Adversarial Adaption in Federated Learning by Multi-Metric Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources. FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks. We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously. MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.