Related papers: Good-Enough LLM Obfuscation (GELO)

Good-Enough LLM Obfuscation (GELO)

URL: http://arxiv.org/abs/2603.05035v1
Date: Thu, 05 Mar 2026 10:33:48 GMT
Title: Good-Enough LLM Obfuscation (GELO)
Authors: Anatoly Belikov, Ilya Fedotov,
Abstract summary: Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV caches and hidden states.<n>We present GELO, a protocol for privacy-preserving inference that limits information leakage from untrusted accelerator observations.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are increasingly served on shared accelerators where an adversary with read access to device memory can observe KV caches and hidden states, threatening prompt privacy for open-source models. Cryptographic protections such as MPC and FHE offer strong guarantees but remain one to two orders of magnitude too slow for interactive inference, while static obfuscation schemes break under multi-run statistical attacks once the model is known. We present GELO (Good-Enough LLM Obfuscation), a lightweight protocol for privacy-preserving inference that limits information leakage from untrusted accelerator observations by hiding hidden states with fresh, per-batch invertible mixing. For each offloaded projection, the TEE samples a random matrix A, forms $U = AH$, offloads U and weights W to the accelerator, and then applies $A^-1$ on return, so that $A^-1 ((AH)W ) = HW$ and outputs are unchanged. Because mixing is never reused across batches, the attacker faces only a single-batch blind source separation problem. We analyze information leakage and introduce two practical defenses: (i) non-orthogonal mixing to mask Gram matrices, and (ii) orthogonal mixing augmented with a small fraction of high-energy "shield" vectors that pollute higher-order statistics. On Llama-2 7B, GELO preserves float32 outputs exactly, closely matches low-precision baselines, offloads the dominant matrix multiplications with about 20-30% latency overhead, and defeats a range of ICA/BSS and anchor-based attacks.

Related papers

Cachemir: Fully Homomorphic Encrypted Inference of Generative Large Language Model with KV Cache [15.25568382221441]
Cachemir is a KV Cache Accelerated Homomorphic Encrypted LLM Inference Regime.<n>We demonstrate that Cachemir achieves $48.83times$ and $67.16times$ speedup over MOAI (ICML'25) and THOR (CCS'25) respectively on CPU and consumes less than 100 seconds on GPU to generate an output token for Llama-3-8B.
arXiv Detail & Related papers (2026-02-12T01:01:38Z)
Membership Inference Attacks Against Fine-tuned Diffusion Language Models [14.835693946869178]
Diffusion Language Models (DLMs) are a promising alternative to autoregressive language models.<n>This paper presents the first systematic investigation of Membership Inference Attacks (MIA) vulnerabilities in DLMs.<n>We introduce SAMA (Subset-Aggregated Membership Attack), which addresses the sparse signal challenge through robust aggregation.
arXiv Detail & Related papers (2026-01-27T23:40:07Z)
Amulet: Fast TEE-Shielded Inference for On-Device Model Protection [15.936694312917512]
On-device machine learning (ML) introduces new security concerns about model privacy.<n> Storing valuable trained ML models on user devices exposes them to potential extraction by adversaries.<n>We propose Amulet, a fast TEE-shielded on-device inference framework for ML model protection.
arXiv Detail & Related papers (2025-12-08T12:22:51Z)
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on Large Language Models [53.36415620647177]
Semi-structured sparsity offers a promising solution by strategically retaining $N$ elements out of every $M$ weights.<n>Existing (N:M)-compatible approaches typically fall into two categories: rule-based layerwise greedy search, which suffers from considerable errors, and gradient-driven learning, which incurs prohibitive training costs.<n>We propose a novel linear-space probabilistic framework named MaskPro, which aims to learn a prior categorical distribution for every $M$ consecutive weights and subsequently leverages this distribution to generate the (N:M)-sparsity throughout an $N$-way sampling
arXiv Detail & Related papers (2025-06-15T15:02:59Z)
BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models [22.695878922889715]
We introduce the first bit-flip inference cost attack that directly modifies model weights to induce persistent overhead for all users of a compromised LLM.<n>We instantiate this attack paradigm with BitHydra, which (1) minimizes a loss that suppresses the end-of-sequence token (i.e., EOS) and (2) employs an efficient yet effective critical-bit search focused on the EOS embedding vector.
arXiv Detail & Related papers (2025-05-22T13:36:00Z)
FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA [68.44043212834204]
Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in learning (FL)<n>Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in learning (FL)
arXiv Detail & Related papers (2025-05-19T07:32:56Z)
Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models [17.663550432103534]
Multimodal Large Language Models (MLLMs) extend the capacity of LLMs to understand multimodal information comprehensively. These models are susceptible to jailbreak attacks, where malicious users can break the safety alignment of the target model and generate misleading and harmful answers. We propose Cross-modality Information DEtectoR (CIDER), a plug-and-play jailbreaking detector designed to identify maliciously perturbed image inputs.
arXiv Detail & Related papers (2024-07-31T15:02:46Z)
ASETF: A Novel Method for Jailbreak Attack on LLMs through Translate Suffix Embeddings [58.82536530615557]
We propose an Adversarial Suffix Embedding Translation Framework (ASETF) to transform continuous adversarial suffix embeddings into coherent and understandable text. Our method significantly reduces the computation time of adversarial suffixes and achieves a much better attack success rate to existing techniques.
arXiv Detail & Related papers (2024-02-25T06:46:27Z)
HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM Inference [68.59839755875252]
HiRE comprises of two novel components: (i) a compression scheme to cheaply predict top-$k$ rows/columns with high recall, followed by full computation restricted to the predicted subset, and (ii) DA-TOP-$k$: an efficient multi-device approximate top-$k$ operator. We demonstrate that on a one billion parameter model, HiRE applied to both the softmax as well as feedforward layers, achieves almost matching pretraining and downstream accuracy, and speeds up inference latency by $1.47times$ on a single TPUv5e device.
arXiv Detail & Related papers (2024-02-14T18:04:36Z)
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks [99.23352758320945]
We propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on large language models (LLMs) Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs.
arXiv Detail & Related papers (2023-10-05T17:01:53Z)
Composite Adversarial Attacks [57.293211764569996]
Adversarial attack is a technique for deceiving Machine Learning (ML) models. In this paper, a new procedure called Composite Adrial Attack (CAA) is proposed for automatically searching the best combination of attack algorithms. CAA beats 10 top attackers on 11 diverse defenses with less elapsed time.
arXiv Detail & Related papers (2020-12-10T03:21:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.