Relative Entropy Regularized Reinforcement Learning for Efficient Encrypted Policy Synthesis
- URL: http://arxiv.org/abs/2506.12358v1
- Date: Sat, 14 Jun 2025 05:41:03 GMT
- Title: Relative Entropy Regularized Reinforcement Learning for Efficient Encrypted Policy Synthesis
- Authors: Jihoon Suh, Yeongjun Jang, Kaoru Teranishi, Takashi Tanaka,
- Abstract summary: We propose an efficient encrypted policy synthesis to develop privacy-preserving model-based reinforcement learning.<n>We first demonstrate that the relative-entropy-regularized reinforcement learning framework offers a computationally convenient linear and min-free'' structure for value iteration.<n>Results demonstrate the effectiveness of the RERL framework in integrating FHE for encrypted policy synthesis.
- Score: 0.6249768559720122
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose an efficient encrypted policy synthesis to develop privacy-preserving model-based reinforcement learning. We first demonstrate that the relative-entropy-regularized reinforcement learning framework offers a computationally convenient linear and ``min-free'' structure for value iteration, enabling a direct and efficient integration of fully homomorphic encryption with bootstrapping into policy synthesis. Convergence and error bounds are analyzed as encrypted policy synthesis propagates errors under the presence of encryption-induced errors including quantization and bootstrapping. Theoretical analysis is validated by numerical simulations. Results demonstrate the effectiveness of the RERL framework in integrating FHE for encrypted policy synthesis.
Related papers
- Post-Quantum Secure Aggregation via Code-Based Homomorphic Encryption [19.264286608481296]
We present a code-based alternative for secure aggregation based on key- and message-additive homomorphic encryption.<n>Our construction employs a committee-based decryptor realized via secret sharing.<n>We evaluate performance and identify regimes in which our approach outperforms information-theoretically secure aggregation protocols.
arXiv Detail & Related papers (2026-01-19T13:14:01Z) - Physical Layer Deception based on Semantic Distortion [58.38604209714828]
Physical layer deception (PLD) is a framework that integrates physical layer security (PLS) with deception techniques.<n>We extend this framework to a semantic communication model and conduct a theoretical analysis using semantic distortion as the performance metric.
arXiv Detail & Related papers (2025-10-16T18:23:35Z) - Unlocking Symbol-Level Precoding Efficiency Through Tensor Equivariant Neural Network [84.22115118596741]
We propose an end-to-end deep learning (DL) framework with low inference complexity for symbol-level precoding.<n>We show that the proposed framework captures substantial performance gains of optimal SLP, while achieving an approximately 80-times speedup over conventional methods.
arXiv Detail & Related papers (2025-10-02T15:15:50Z) - Taming Polysemanticity in LLMs: Provable Feature Recovery via Sparse Autoencoders [50.52694757593443]
Existing SAE training algorithms often lack rigorous mathematical guarantees and suffer from practical limitations.<n>We first propose a novel statistical framework for the feature recovery problem, which includes a new notion of feature identifiability.<n>We introduce a new SAE training algorithm based on bias adaptation'', a technique that adaptively adjusts neural network bias parameters to ensure appropriate activation sparsity.
arXiv Detail & Related papers (2025-06-16T20:58:05Z) - BASIL: Best-Action Symbolic Interpretable Learning for Evolving Compact RL Policies [0.0]
BASIL (Best-Action Symbolic Interpretable Learning) is a systematic approach for generating symbolic, rule-based policies.<n>This article introduces a new interpretable policy synthesis method that combines symbolic expressiveness, evolutionary diversity, and online learning.
arXiv Detail & Related papers (2025-05-31T00:47:24Z) - Efficient Implementation of Reinforcement Learning over Homomorphic Encryption [0.7673339435080445]
We classify control policy synthesis into model-based, simulator-driven, and data-driven approaches.<n>We examine their implementation over fully homomorphic encryption (FHE) for privacy enhancements.<n>Our work suggests the potential for secure and efficient cloud-based reinforcement learning.
arXiv Detail & Related papers (2025-04-12T20:34:26Z) - RL-finetuning LLMs from on- and off-policy data with a single algorithm [53.70731390624718]
We introduce a novel reinforcement learning algorithm (AGRO) for fine-tuning large-language models.<n>AGRO leverages the concept of generation consistency, which states that the optimal policy satisfies the notion of consistency across any possible generation of the model.<n>We derive algorithms that find optimal solutions via the sample-based policy gradient and provide theoretical guarantees on their convergence.
arXiv Detail & Related papers (2025-03-25T12:52:38Z) - Towards Theoretical Understanding of Data-Driven Policy Refinement [0.0]
This paper presents an approach for data-driven policy refinement in reinforcement learning, specifically designed for safety-critical applications.
Our principal contribution lies in the mathematical formulation of this data-driven policy refinement concept.
We present a series of theorems elucidating key theoretical properties of our approach, including convergence, robustness bounds, generalization error, and resilience to model mismatch.
arXiv Detail & Related papers (2023-05-11T13:36:21Z) - Bounded Robustness in Reinforcement Learning via Lexicographic
Objectives [54.00072722686121]
Policy robustness in Reinforcement Learning may not be desirable at any cost.
We study how policies can be maximally robust to arbitrary observational noise.
We propose a robustness-inducing scheme, applicable to any policy algorithm, that trades off expected policy utility for robustness.
arXiv Detail & Related papers (2022-09-30T08:53:18Z) - Bellman Residual Orthogonalization for Offline Reinforcement Learning [53.17258888552998]
We introduce a new reinforcement learning principle that approximates the Bellman equations by enforcing their validity only along a test function space.
We exploit this principle to derive confidence intervals for off-policy evaluation, as well as to optimize over policies within a prescribed policy class.
arXiv Detail & Related papers (2022-03-24T01:04:17Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z) - Deep synthesis regularization of inverse problems [0.0]
In this paper, we introduce deep synthesis regularization (DESYRE) using neural networks as nonlinear synthesis operator.
The proposed method allows to exploit the deep learning benefits of being well adjustable to available training data.
We present a strategy for constructing a synthesis network as part of an analysis-synthesis sequence together with an appropriate training strategy.
arXiv Detail & Related papers (2020-02-01T06:50:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.