AERO: Entropy-Guided Framework for Private LLM Inference
- URL: http://arxiv.org/abs/2410.13060v3
- Date: Thu, 30 Oct 2025 20:53:57 GMT
- Title: AERO: Entropy-Guided Framework for Private LLM Inference
- Authors: Nandan Kumar Jha, Brandon Reagen,
- Abstract summary: We introduce an entropy-guided framework to eliminate costly nonlinear operations from transformer architectures.<n>Experiments show AERO can save 3.4$times$ communication and 1.4$times$ latency, without any performance penalty.
- Score: 2.8232103900765693
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Privacy-preserving computation enables language model inference directly on encrypted data yet suffers from prohibitive latency and communication overheads, primarily due to nonlinear functions. Removing nonlinearities, however, can trigger one of two failure modes restricting the potential for nonlinearity removal: entropy collapse in deeper layers, which destabilizes training, and entropic overload in early layers, causing under-utilization of attention heads. To address these challenges, we introduce AERO, an entropy-guided framework to strategically eliminates costly nonlinear operations from transformer architectures, which employs an adaptive recalibration through a head-wise entropy regularizer with learnable per-head strengths, enabling each head to adjust its entropy level while penalizing extreme entropies and fostering functional diversity through a tolerance margin. Experiments show AERO can save 3.4$\times$ communication and 1.4$\times$ latency, without any performance penalty.
Related papers
- Unifying Stable Optimization and Reference Regularization in RLHF [64.16830602324345]
This paper introduces a unified regularization approach that balances objectives of preventing reward hacking and maintaining stable policy updates.<n>Our simple yet principled alignment objective yields a weighted supervised fine-tuning loss with a superior trade-off, which demonstrably improves both alignment results and implementation complexity.
arXiv Detail & Related papers (2026-02-12T03:31:19Z) - Generalizing GNNs with Tokenized Mixture of Experts [75.8310720413187]
We show that improving stability requires reducing reliance on shift-sensitive features, leaving an irreducible worst-case generalization floor.<n>We propose STEM-GNN, a pretrain-then-finetune framework with a mixture-of-experts encoder for diverse computation paths.<n>Across nine node, link, and graph benchmarks, STEM-GNN achieves a stronger three-way balance, improving robustness to degree/homophily shifts and to feature/edge corruptions while remaining competitive on clean graphs.
arXiv Detail & Related papers (2026-02-09T22:48:30Z) - Uncertainty-Aware 4D Gaussian Splatting for Monocular Occluded Human Rendering [20.390068289144484]
We propose U-4DGS, a framework integrating a Probabilistic Deformation Network and a Double Rasterization pipeline.<n>This architecture renders pixel-aligned uncertainty maps that act as an adaptive modulator, automatically attenuating artifacts from unreliable observations.<n>Experiments on ZJU-MoCap and OcMotion demonstrate that U-4DGS achieves SOTA rendering fidelity and robustness.
arXiv Detail & Related papers (2026-02-06T03:14:37Z) - TABES: Trajectory-Aware Backward-on-Entropy Steering for Masked Diffusion Models [35.327100592206115]
Backward-on-Entropy (BoE) Steering is a gradient-guided inference framework that approximates infinite-horizon context via a single backward pass.<n>To ensure scalability, we introduce ttexttActiveQueryAttention, a sparse adjoint primitive that exploits the structure of the masking objective to reduce backward pass complexity.
arXiv Detail & Related papers (2026-01-30T19:10:32Z) - LEGATO: Good Identity Unlearning Is Continuous [30.78585550012454]
LEGATO is a machine unlearning method for learning to forget identity of generative models.<n>It achieves state-of-the-art forgetting performance, avoids catastrophic collapse and reduces fine-tuned parameters.<n>Our experiments show that LEGATO achieves state-of-the-art forgetting performance, avoids catastrophic collapse and reduces fine-tuned parameters.
arXiv Detail & Related papers (2026-01-07T13:15:25Z) - The Best of Both Worlds: Hybridizing Neural Operators and Solvers for Stable Long-Horizon Inference [0.0]
ANCHOR is an online, instance-aware hybrid inference framework for stable long-horizon prediction of PDEs.<n>We show that ANCHOR reliably bounds long-horizon error growth, stabilizes extrapolative rollouts, and significantly improves robustness over standalone neural operators.
arXiv Detail & Related papers (2025-12-22T18:17:28Z) - Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning [55.59724323303857]
We propose a framework that balances exploration and exploitation via three components: difficulty-aware coefficient allocation, initial-anchored target entropy, and dynamic global coefficient adjustment.<n>Experiments on multiple mathematical reasoning benchmarks show that AER consistently outperforms baselines, improving both reasoning accuracy and exploration capability.
arXiv Detail & Related papers (2025-10-13T03:10:26Z) - DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models [50.21378052667732]
We conduct an in-depth analysis of dLLM vulnerabilities to jailbreak attacks across two distinct dimensions: intra-step and inter-step dynamics.<n>We propose DiffuGuard, a training-free defense framework that addresses vulnerabilities through a dual-stage approach.
arXiv Detail & Related papers (2025-09-29T05:17:10Z) - SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks [17.77094760401298]
We study the vulnerability of fine-tuned large language models to membership inference attacks (MIAs)<n>We propose SOFT, a novel defense technique that mitigates privacy leakage by leveraging influential data selection with an adjustable parameter to balance utility preservation and privacy protection.
arXiv Detail & Related papers (2025-06-12T07:23:56Z) - Dynamic Noise Preference Optimization for LLM Self-Improvement via Synthetic Data [51.62162460809116]
We introduce Dynamic Noise Preference Optimization (DNPO) to ensure consistent improvements across iterations.
In experiments with Zephyr-7B, DNPO consistently outperforms existing methods, showing an average performance boost of 2.6%.
DNPO shows a significant improvement in model-generated data quality, with a 29.4% win-loss rate gap compared to the baseline in GPT-4 evaluations.
arXiv Detail & Related papers (2025-02-08T01:20:09Z) - Entropy-Guided Attention for Private LLMs [3.7802450241986945]
We introduce an information-theoretic framework to characterize the role of nonlinearities in decoder-only language models.
By leveraging Shannon's entropy as a quantitative measure, we uncover the previously unexplored dual significance of nonlinearities.
We propose an entropy-guided attention mechanism paired with a novel entropy regularization technique to mitigate entropic overload.
arXiv Detail & Related papers (2025-01-07T03:17:47Z) - MOFHEI: Model Optimizing Framework for Fast and Efficient Homomorphically Encrypted Neural Network Inference [0.8388591755871735]
Homomorphic Encryption (HE) enables us to perform machine learning tasks over encrypted data.
We propose MOFHEI, a framework that optimize the model to make HE-based neural network inference, fast and efficient.
Our framework achieves up to 98% pruning ratio on LeNet, eliminating up to 93% of the required HE operations for performing PI.
arXiv Detail & Related papers (2024-12-10T22:44:54Z) - TruncFormer: Private LLM Inference Using Only Truncations [20.477495294254997]
Private inference (PI) serves an important role in guaranteeing the privacy of user data.
PI remains practically intractable due to the massive latency costs associated with nonlinear functions in machine learning models.
We introduce TruncFormer, a framework for taking any machine learning model and transforming it into a plaintext emulation of PI.
arXiv Detail & Related papers (2024-12-02T01:55:42Z) - Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design [59.00758127310582]
We propose a novel framework Read-ME that transforms pre-trained dense LLMs into smaller MoE models.
Our approach employs activation sparsity to extract experts.
Read-ME outperforms other popular open-source dense models of similar scales.
arXiv Detail & Related papers (2024-10-24T19:48:51Z) - Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix [17.086679273053853]
Large Language Models (LLMs) have shown immense potential in enhancing various aspects of our daily lives.
Their growing capabilities come at the cost of extremely large model sizes, making deployment on edge devices challenging.
This paper introduces a novel approach to LLM weight pruning that directly optimize for approximating the attention matrix.
arXiv Detail & Related papers (2024-10-15T04:35:56Z) - Model-Based Privacy-Preserving Knowledge Transfer for Large Language Models [34.949731264918846]
Llamdex is a framework that enhances large language models (LLMs) using only models trained on domain-specific data.
Our approach significantly enhances the accuracy of domain-specific tasks, achieving up to a 26% accuracy improvement.
arXiv Detail & Related papers (2024-10-14T13:18:20Z) - Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores [3.6385567224218556]
Large language models (LLMs) have been widely applied but face challenges in efficient inference.
We introduce a novel bipolar-INT data format that facilitates parallel computing and supports symmetric quantization.
We implement an arbitrary precision matrix multiplication scheme that decomposes and recovers at the bit level, enabling flexible precision.
arXiv Detail & Related papers (2024-09-26T14:17:58Z) - ASFT: Aligned Supervised Fine-Tuning through Absolute Likelihood [14.512464277772194]
Aligned Supervised Fine-Tuning (ASFT) is an effective approach that better aligns Large Language Models with pair-wise datasets.
ASFT mitigates the issue where the DPO loss function decreases the probability of generating human-dispreferred data.
Extensive experiments demonstrate that ASFT is an effective alignment approach, consistently outperforming existing methods.
arXiv Detail & Related papers (2024-09-14T11:39:13Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning on Large-Language Models.
We learn the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model.
Our method operates for 2.7 hours with around 35GB memory for the 13B models on a single A100 GPU.
arXiv Detail & Related papers (2024-06-15T09:31:03Z) - Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment [104.18002641195442]
We introduce Self-Augmented Preference Optimization (SAPO), an effective and scalable training paradigm that does not require existing paired data.
Building on the self-play concept, which autonomously generates negative responses, we further incorporate an off-policy learning pipeline to enhance data exploration and exploitation.
arXiv Detail & Related papers (2024-05-31T14:21:04Z) - Multi-Reference Preference Optimization for Large Language Models [56.84730239046117]
We introduce a novel closed-form formulation for direct preference optimization using multiple reference models.
The resulting algorithm, Multi-Reference Preference Optimization (MRPO), leverages broader prior knowledge from diverse reference models.
Our experiments demonstrate that LLMs finetuned with MRPO generalize better in various preference data, regardless of data scarcity or abundance.
arXiv Detail & Related papers (2024-05-26T00:29:04Z) - SecFormer: Towards Fast and Accurate Privacy-Preserving Inference for Large Language Models [34.63351580241698]
We introduce an advanced optimization framework called SecFormer to achieve fast and accurate PPI for Transformer models.
In terms of efficiency, SecFormer is 3.56 and 3.58 times faster than Puma for BERT$_textBASE$ and BERT$_textLARGE$, respectively.
arXiv Detail & Related papers (2024-01-01T15:40:35Z) - Over-the-Air Federated Learning with Privacy Protection via Correlated
Additive Perturbations [57.20885629270732]
We consider privacy aspects of wireless federated learning with Over-the-Air (OtA) transmission of gradient updates from multiple users/agents to an edge server.
Traditional perturbation-based methods provide privacy protection while sacrificing the training accuracy.
In this work, we aim at minimizing privacy leakage to the adversary and the degradation of model accuracy at the edge server.
arXiv Detail & Related papers (2022-10-05T13:13:35Z) - Improving Generalization via Uncertainty Driven Perturbations [107.45752065285821]
We consider uncertainty-driven perturbations of the training data points.
Unlike loss-driven perturbations, uncertainty-guided perturbations do not cross the decision boundary.
We show that UDP is guaranteed to achieve the robustness margin decision on linear models.
arXiv Detail & Related papers (2022-02-11T16:22:08Z) - Optimization-driven Machine Learning for Intelligent Reflecting Surfaces
Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts.
Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity.
In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.