On Reward-Free Reinforcement Learning with Linear Function Approximation
- URL: http://arxiv.org/abs/2006.11274v1
- Date: Fri, 19 Jun 2020 17:59:36 GMT
- Title: On Reward-Free Reinforcement Learning with Linear Function Approximation
- Authors: Ruosong Wang, Simon S. Du, Lin F. Yang, Ruslan Salakhutdinov
- Abstract summary: Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest.
In this work, we give both positive and negative results for reward-free RL with linear function approximation.
- Score: 144.4210285338698
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reward-free reinforcement learning (RL) is a framework which is suitable for
both the batch RL setting and the setting where there are many reward functions
of interest. During the exploration phase, an agent collects samples without
using a pre-specified reward function. After the exploration phase, a reward
function is given, and the agent uses samples collected during the exploration
phase to compute a near-optimal policy. Jin et al. [2020] showed that in the
tabular setting, the agent only needs to collect polynomial number of samples
(in terms of the number states, the number of actions, and the planning
horizon) for reward-free RL. However, in practice, the number of states and
actions can be large, and thus function approximation schemes are required for
generalization. In this work, we give both positive and negative results for
reward-free RL with linear function approximation. We give an algorithm for
reward-free RL in the linear Markov decision process setting where both the
transition and the reward admit linear representations. The sample complexity
of our algorithm is polynomial in the feature dimension and the planning
horizon, and is completely independent of the number of states and actions. We
further give an exponential lower bound for reward-free RL in the setting where
only the optimal $Q$-function admits a linear representation. Our results imply
several interesting exponential separations on the sample complexity of
reward-free RL.
Related papers
- Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback [38.61232011566285]
We study the recently proposed model of RL with Aggregate Bandit Feedback (RL-ABF), where the agent only observes the sum of rewards at the end of an episode instead of each reward individually.
In this paper, we extend ABF to linear function approximation and develop two efficient algorithms with near-optimal regret guarantees.
arXiv Detail & Related papers (2024-05-13T10:51:01Z) - Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs [60.40452803295326]
We propose a new reward-free algorithm for learning linear Markov decision processes (MDPs)
At the core of our algorithm is uncertainty-weighted value-targeted regression with exploration-driven pseudo-reward.
We show that our algorithm only needs to explore $tilde O( d2varepsilon-2)$ episodes to find an $varepsilon$-optimal policy.
arXiv Detail & Related papers (2023-03-17T17:53:28Z) - Provably Efficient Offline Reinforcement Learning with Trajectory-Wise
Reward [66.81579829897392]
We propose a novel offline reinforcement learning algorithm called Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED)
PARTED decomposes the trajectory return into per-step proxy rewards via least-squares-based reward redistribution, and then performs pessimistic value based on the learned proxy reward.
To the best of our knowledge, PARTED is the first offline RL algorithm that is provably efficient in general MDP with trajectory-wise reward.
arXiv Detail & Related papers (2022-06-13T19:11:22Z) - Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov
Decision Processes [61.11090361892306]
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration.
We show that this separation does not exist in the setting of linear MDPs.
We develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP.
arXiv Detail & Related papers (2022-01-26T22:09:59Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - Reward-Free Exploration for Reinforcement Learning [82.3300753751066]
We propose a new "reward-free RL" framework to isolate the challenges of exploration.
We give an efficient algorithm that conducts $tildemathcalO(S2Amathrmpoly(H)/epsilon2)$ episodes of exploration.
We also give a nearly-matching $Omega(S2AH2/epsilon2)$ lower bound, demonstrating the near-optimality of our algorithm in this setting.
arXiv Detail & Related papers (2020-02-07T14:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.