Efficient Implementation of Reinforcement Learning over Homomorphic Encryption
- URL: http://arxiv.org/abs/2504.09335v1
- Date: Sat, 12 Apr 2025 20:34:26 GMT
- Title: Efficient Implementation of Reinforcement Learning over Homomorphic Encryption
- Authors: Jihoon Suh, Takashi Tanaka,
- Abstract summary: We classify control policy synthesis into model-based, simulator-driven, and data-driven approaches.<n>We examine their implementation over fully homomorphic encryption (FHE) for privacy enhancements.<n>Our work suggests the potential for secure and efficient cloud-based reinforcement learning.
- Score: 0.7673339435080445
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate encrypted control policy synthesis over the cloud. While encrypted control implementations have been studied previously, we focus on the less explored paradigm of privacy-preserving control synthesis, which can involve heavier computations ideal for cloud outsourcing. We classify control policy synthesis into model-based, simulator-driven, and data-driven approaches and examine their implementation over fully homomorphic encryption (FHE) for privacy enhancements. A key challenge arises from comparison operations (min or max) in standard reinforcement learning algorithms, which are difficult to execute over encrypted data. This observation motivates our focus on Relative-Entropy-regularized reinforcement learning (RL) problems, which simplifies encrypted evaluation of synthesis algorithms due to their comparison-free structures. We demonstrate how linearly solvable value iteration, path integral control, and Z-learning can be readily implemented over FHE. We conduct a case study of our approach through numerical simulations of encrypted Z-learning in a grid world environment using the CKKS encryption scheme, showing convergence with acceptable approximation error. Our work suggests the potential for secure and efficient cloud-based reinforcement learning.
Related papers
- Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation [60.04281435591454]
CRDA (Curriculum Reinforcement-Learning Data Augmentation) is a novel framework guiding detectors to progressively master multi-domain forgery features.<n>Central to our approach is integrating reinforcement learning and causal inference.<n>Our method significantly improves detector generalizability, outperforming SOTA methods across multiple cross-domain datasets.
arXiv Detail & Related papers (2025-11-10T12:45:52Z) - Traveling Salesman-Based Token Ordering Improves Stability in Homomorphically Encrypted Language Models [16.73757071734074]
Homomorphic encryption (HE) provides a principled solution by enabling computation directly on encrypted data.<n>The challenge of text generation, particularly next-token prediction, has received limited attention.<n>We propose a TSP-based token reordering strategy to address the difficulties of encrypted text generation.
arXiv Detail & Related papers (2025-10-14T09:56:50Z) - Relative Entropy Regularized Reinforcement Learning for Efficient Encrypted Policy Synthesis [0.6249768559720122]
We propose an efficient encrypted policy synthesis to develop privacy-preserving model-based reinforcement learning.<n>We first demonstrate that the relative-entropy-regularized reinforcement learning framework offers a computationally convenient linear and min-free'' structure for value iteration.<n>Results demonstrate the effectiveness of the RERL framework in integrating FHE for encrypted policy synthesis.
arXiv Detail & Related papers (2025-06-14T05:41:03Z) - Cryptanalysis via Machine Learning Based Information Theoretic Metrics [58.96805474751668]
We propose two novel applications of machine learning (ML) algorithms to perform cryptanalysis on any cryptosystem.<n>These algorithms can be readily applied in an audit setting to evaluate the robustness of a cryptosystem.<n>We show that our classification model correctly identifies the encryption schemes that are not IND-CPA secure, such as DES, RSA, and AES ECB, with high accuracy.
arXiv Detail & Related papers (2025-01-25T04:53:36Z) - Privacy-Preserving Cyberattack Detection in Blockchain-Based IoT Systems Using AI and Homomorphic Encryption [22.82443900809095]
This work proposes a privacy-preserving cyberattack detection framework for blockchain-based Internet-of-Things (IoT) systems.<n>In our approach, artificial intelligence (AI)-driven detection modules are strategically deployed at blockchain nodes to identify real-time attacks.<n>To safeguard privacy, the data is encrypted using homomorphic encryption (HE) before transmission.<n>Our simulation results demonstrate that our proposed method can not only mitigate the training time but also achieve detection accuracy that is approximately identical to the approach without encryption, with a gap of around 0.01%.
arXiv Detail & Related papers (2024-12-18T05:46:53Z) - FLUE: Federated Learning with Un-Encrypted model weights [0.0]
Federated learning enables devices to collaboratively train a shared model while keeping training data locally stored.
Recent research emphasizes using encrypted model parameters during training.
This paper introduces a novel federated learning algorithm, leveraging coded local gradients without encryption.
arXiv Detail & Related papers (2024-07-26T14:04:57Z) - Stable Inverse Reinforcement Learning: Policies from Control Lyapunov Landscapes [4.229902091180109]
We propose a novel, stability-certified IRL approach to learning control Lyapunov functions from demonstrations data.
By exploiting closed-form expressions for associated control policies, we are able to efficiently search the space of CLFs.
We present a theoretical analysis of the optimality properties provided by the CLF and evaluate our approach using both simulated and real-world data.
arXiv Detail & Related papers (2024-05-14T16:40:45Z) - Verifiable Encodings for Secure Homomorphic Analytics [10.402772462535884]
Homomorphic encryption is a promising solution for protecting privacy of cloud-delegated computations on sensitive data.
We propose two error detection encodings and build authenticators that enable practical client-verification of cloud-based homomorphic computations.
We implement our solution in VERITAS, a ready-to-use system for verification of outsourced computations executed over encrypted data.
arXiv Detail & Related papers (2022-07-28T13:22:21Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse [15.134707391442236]
We develop a new class of model-free deep reinforcement learning algorithms for data-driven, learning-based control.
Our Generalized Policy Improvement algorithms combine the policy improvement guarantees of on-policy methods with the efficiency of sample reuse.
arXiv Detail & Related papers (2022-06-28T02:56:12Z) - CCLF: A Contrastive-Curiosity-Driven Learning Framework for
Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning.
CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner.
We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z) - Learning Optimal Antenna Tilt Control Policies: A Contextual Linear
Bandit Approach [65.27783264330711]
Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity.
We devise algorithms learning optimal tilt control policies from existing data.
We show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms.
arXiv Detail & Related papers (2022-01-06T18:24:30Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Deep RL With Information Constrained Policies: Generalization in
Continuous Control [21.46148507577606]
We show that a natural constraint on information flow might confer onto artificial agents in continuous control tasks.
We implement a novel Capacity-Limited Actor-Critic (CLAC) algorithm.
Our experiments show that compared to alternative approaches, CLAC offers improvements in generalization between training and modified test environments.
arXiv Detail & Related papers (2020-10-09T15:42:21Z) - Provably Efficient Exploration for Reinforcement Learning Using
Unsupervised Learning [96.78504087416654]
Motivated by the prevailing paradigm of using unsupervised learning for efficient exploration in reinforcement learning (RL) problems, we investigate when this paradigm is provably efficient.
We present a general algorithmic framework that is built upon two components: an unsupervised learning algorithm and a noregret tabular RL algorithm.
arXiv Detail & Related papers (2020-03-15T19:23:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.