Structured Reinforcement Learning for Incentivized Stochastic Covert Optimization
- URL: http://arxiv.org/abs/2405.07415v1
- Date: Mon, 13 May 2024 01:29:48 GMT
- Title: Structured Reinforcement Learning for Incentivized Stochastic Covert Optimization
- Authors: Adit Jain, Vikram Krishnamurthy,
- Abstract summary: A gradient algorithm (SG) can be controlled to hide the estimate of the local stationary point from an eavesdropper.
This paper studies how a gradient algorithm (SG) can be controlled to hide the estimate of the local stationary point from an eavesdropper.
- Score: 13.440621354486906
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies how a stochastic gradient algorithm (SG) can be controlled to hide the estimate of the local stationary point from an eavesdropper. Such problems are of significant interest in distributed optimization settings like federated learning and inventory management. A learner queries a stochastic oracle and incentivizes the oracle to obtain noisy gradient measurements and perform SG. The oracle probabilistically returns either a noisy gradient of the function} or a non-informative measurement, depending on the oracle state and incentive. The learner's query and incentive are visible to an eavesdropper who wishes to estimate the stationary point. This paper formulates the problem of the learner performing covert optimization by dynamically incentivizing the stochastic oracle and obfuscating the eavesdropper as a finite-horizon Markov decision process (MDP). Using conditions for interval-dominance on the cost and transition probability structure, we show that the optimal policy for the MDP has a monotone threshold structure. We propose searching for the optimal stationary policy with the threshold structure using a stochastic approximation algorithm and a multi-armed bandit approach. The effectiveness of our methods is numerically demonstrated on a covert federated learning hate-speech classification task.
Related papers
- Fast Value Tracking for Deep Reinforcement Learning [7.648784748888187]
Reinforcement learning (RL) tackles sequential decision-making problems by creating agents that interact with their environment.
Existing algorithms often view these problem as static, focusing on point estimates for model parameters to maximize expected rewards.
Our research leverages the Kalman paradigm to introduce a novel quantification and sampling algorithm called Langevinized Kalman TemporalTD.
arXiv Detail & Related papers (2024-03-19T22:18:19Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Controlling Federated Learning for Covertness [15.878313629774269]
A learner aims to minimize a function $f$ by repeatedly querying a distributed oracle that provides noisy gradient evaluations.
At the same time, the learner seeks to hide $argmin f$ from a malicious eavesdropper that observes the learner's queries.
This paper considers the problem of textitcovert or textitlearner-private optimization, where the learner has to dynamically choose between learning and obfuscation.
arXiv Detail & Related papers (2023-08-17T07:16:41Z) - Value-Distributional Model-Based Reinforcement Learning [59.758009422067]
Quantifying uncertainty about a policy's long-term performance is important to solve sequential decision-making tasks.
We study the problem from a model-based Bayesian reinforcement learning perspective.
We propose Epistemic Quantile-Regression (EQR), a model-based algorithm that learns a value distribution function.
arXiv Detail & Related papers (2023-08-12T14:59:19Z) - Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - An Efficient Algorithm for Deep Stochastic Contextual Bandits [10.298368632706817]
In contextual bandit problems, an agent selects an action based on certain observed context to maximize the reward over iterations.
Recently there have been a few studies using a deep neural network (DNN) to predict the expected reward for an action, and is trained by a gradient based method.
arXiv Detail & Related papers (2021-04-12T16:34:43Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Provably Efficient Reward-Agnostic Navigation with Linear Value
Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function.
We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z) - Stochastic Learning for Sparse Discrete Markov Random Fields with
Controlled Gradient Approximation Error [10.381976180143328]
We study the $L_$-regularized maximum likelihood estimator/estimation (MLE) problem for discrete Markov random fields (MRFs)
To address these challenges, we consider a verifiable learning framework called proximal gradient (SPG)
We provide novel verifiable bounds to inspect and control the quality of the gradient approximation.
arXiv Detail & Related papers (2020-05-12T22:48:42Z) - StochasticRank: Global Optimization of Scale-Free Discrete Functions [28.224889996383396]
In this paper, we introduce a powerful and efficient framework for direct optimization of ranking metrics.
We show that classic smoothing approaches may introduce bias and present a universal solution for a proper debiasing.
Our framework applies to any scale-free discrete loss function.
arXiv Detail & Related papers (2020-03-04T15:27:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.