Related papers: Efficient Local Planning with Linear Function Approximation

Efficient Local Planning with Linear Function Approximation

URL: http://arxiv.org/abs/2108.05533v1
Date: Thu, 12 Aug 2021 04:56:33 GMT
Title: Efficient Local Planning with Linear Function Approximation
Authors: Dong Yin, Botao Hao, Yasin Abbasi-Yadkori, Nevena Lazi\'{c}, Csaba Szepesv\'{a}ri
Abstract summary: We study query and computationally efficient planning algorithms with linear function approximation and a simulator. We propose an algorithm named confident Monte Carlo least square policy iteration (Confident MC-LSPI) for this setting. One technical contribution of our work is the introduction of a novel proof technique that makes use of a virtual policy algorithm.
Score: 27.90696655434707
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study query and computationally efficient planning algorithms with linear function approximation and a simulator. We assume that the agent only has local access to the simulator, meaning that the agent can only query the simulator at states that have been visited before. This setting is more practical than many prior works on reinforcement learning with a generative model. We propose an algorithm named confident Monte Carlo least square policy iteration (Confident MC-LSPI) for this setting. Under the assumption that the Q-functions of all deterministic policies are linear in known features of the state-action pairs, we show that our algorithm has polynomial query and computational complexities in the dimension of the features, the effective planning horizon and the targeted sub-optimality, while these complexities are independent of the size of the state space. One technical contribution of our work is the introduction of a novel proof technique that makes use of a virtual policy iteration algorithm. We use this method to leverage existing results on $\ell_\infty$-bounded approximate policy iteration to show that our algorithm can learn the optimal policy for the given initial state even only with local access to the simulator. We believe that this technique can be extended to broader settings beyond this work.

Related papers

Frozen Policy Iteration: Computationally Efficient RL under Linear $Q^π$ Realizability for Deterministic Dynamics [23.744125839429415]
We study computationally and statistically efficient reinforcement learning under the linear $Q$ realizability assumption.<n>Our algorithm achieves a regret bound of $widetildeO(sqrtd2H6T)$, where $d$ is the dimensionality of the feature space, $H$ is the horizon length, and $T$ is the total number of episodes.
arXiv Detail & Related papers (2026-02-28T15:41:47Z)
Learning Policy Representations for Steerable Behavior Synthesis [80.4542176039074]
Given a Markov decision process (MDP), we seek to learn representations for a range of policies to facilitate behavior steering at test time.<n>We show that these representations can be approximated uniformly for a range of policies using a set-based architecture.<n>We use variational generative approach to induce a smooth latent space, and further shape it with contrastive learning so that latent distances align with differences in value functions.
arXiv Detail & Related papers (2026-01-29T21:52:06Z)
The Power of Resets in Online Reinforcement Learning [73.64852266145387]
We explore the power of simulators through online reinforcement learning with local simulator access (or, local planning) We show that MDPs with low coverability can be learned in a sample-efficient fashion with only $Qstar$-realizability. We show that the notorious Exogenous Block MDP problem is tractable under local simulator access.
arXiv Detail & Related papers (2024-04-23T18:09:53Z)
Offline Imitation Learning from Multiple Baselines with Applications to Compiler Optimization [17.729842629392742]
We study a Reinforcement Learning problem in which we are given a set of trajectories collected with K baseline policies. The goal is to learn a policy which performs as well as the best combination of baselines on the entire state space.
arXiv Detail & Related papers (2024-03-28T14:34:02Z)
Stabilizing Q-learning with Linear Architectures for Provably Efficient Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation. We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z)
Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach [65.27783264330711]
Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity. We devise algorithms learning optimal tilt control policies from existing data. We show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms.
arXiv Detail & Related papers (2022-01-06T18:24:30Z)
Implicitly Regularized RL with Implicit Q-Values [42.87920755961722]
The $Q$-function is a central quantity in many Reinforcement Learning (RL) algorithms for which RL agents behave following a (soft)-greedy policy. We propose to parametrize the $Q$-function implicitly, as the sum of a log-policy and of a value function. We derive a practical off-policy deep RL algorithm, suitable for large action spaces, and that enforces the softmax relation between the policy and the $Q$-value.
arXiv Detail & Related papers (2021-08-16T12:20:47Z)
Provably Correct Optimization and Exploration with Non-linear Policies [65.60853260886516]
ENIAC is an actor-critic method that allows non-linear function approximation in the critic. We show that under certain assumptions, the learner finds a near-optimal policy in $O(poly(d))$ exploration rounds. We empirically evaluate this adaptation and show that it outperforms priors inspired by linear methods.
arXiv Detail & Related papers (2021-03-22T03:16:33Z)
Improved Regret Bound and Experience Replay in Regularized Policy Iteration [22.621710838468097]
We study algorithms for learning in infinite-horizon undiscounted Markov decision processes (MDPs) with function approximation. We first show that the regret analysis of the Politex algorithm can be sharpened from $O(T3/4)$ to $O(sqrtT)$ under nearly identical assumptions. Our result provides the first high-probability $O(sqrtT)$ regret bound for a computationally efficient algorithm in this setting.
arXiv Detail & Related papers (2021-02-25T00:55:07Z)
On Query-efficient Planning in MDPs under Linear Realizability of the Optimal State-value Function [14.205660708980988]
We consider the problem of local planning in fixed-horizon Markov Decision Processes (MDPs) with a generative model. A recent lower bound established that the related problem when the action-value function of the optimal policy is linearly realizable requires an exponential number of queries. In this work, we establish that poly$(H, d)$ learning is possible (with state value function realizability) whenever the action set is small.
arXiv Detail & Related papers (2021-02-03T13:23:15Z)
Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration [143.43658264904863]
We show how iteration under a more standard notion of low inherent Bellman error, typically employed in least-square value-style algorithms, can provide strong PAC guarantees on learning a near optimal value function. We present a computationally tractable algorithm for the reward-free setting and show how it can be used to learn a near optimal policy for any (linear) reward function.
arXiv Detail & Related papers (2020-08-18T04:34:21Z)
Provably Efficient Safe Exploration via Primal-Dual Policy Optimization [105.7510838453122]
We study the Safe Reinforcement Learning (SRL) problem using the Constrained Markov Decision Process (CMDP) formulation. We present an provably efficient online policy optimization algorithm for CMDP with safe exploration in the function approximation setting.
arXiv Detail & Related papers (2020-03-01T17:47:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.