Provably Feedback-Efficient Reinforcement Learning via Active Reward
Learning
- URL: http://arxiv.org/abs/2304.08944v1
- Date: Tue, 18 Apr 2023 12:36:09 GMT
- Title: Provably Feedback-Efficient Reinforcement Learning via Active Reward
Learning
- Authors: Dingwen Kong, Lin F. Yang
- Abstract summary: An appropriate reward function is of paramount importance in specifying a task in reinforcement learning (RL)
Human-in-the-loop (HiL) RL allows humans to communicate complex goals to the RL agent by providing various types of feedback.
We provide an active-learning-based RL algorithm that first explores the environment without specifying a reward function.
- Score: 26.067411894141863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An appropriate reward function is of paramount importance in specifying a
task in reinforcement learning (RL). Yet, it is known to be extremely
challenging in practice to design a correct reward function for even simple
tasks. Human-in-the-loop (HiL) RL allows humans to communicate complex goals to
the RL agent by providing various types of feedback. However, despite achieving
great empirical successes, HiL RL usually requires too much feedback from a
human teacher and also suffers from insufficient theoretical understanding. In
this paper, we focus on addressing this issue from a theoretical perspective,
aiming to provide provably feedback-efficient algorithmic frameworks that take
human-in-the-loop to specify rewards of given tasks. We provide an
active-learning-based RL algorithm that first explores the environment without
specifying a reward function and then asks a human teacher for only a few
queries about the rewards of a task at some state-action pairs. After that, the
algorithm guarantees to provide a nearly optimal policy for the task with high
probability. We show that, even with the presence of random noise in the
feedback, the algorithm only takes $\widetilde{O}(H{{\dim_{R}^2}})$ queries on
the reward function to provide an $\epsilon$-optimal policy for any $\epsilon >
0$. Here $H$ is the horizon of the RL environment, and $\dim_{R}$ specifies the
complexity of the function class representing the reward function. In contrast,
standard RL algorithms require to query the reward function for at least
$\Omega(\operatorname{poly}(d, 1/\epsilon))$ state-action pairs where $d$
depends on the complexity of the environmental transition.
Related papers
- Uncertainty-Aware Reward-Free Exploration with General Function Approximation [69.27868448449755]
In this paper, we propose a reward-free reinforcement learning algorithm called alg.
The key idea behind our algorithm is an uncertainty-aware intrinsic reward for exploring the environment.
Experiment results show that GFA-RFE outperforms or is comparable to the performance of state-of-the-art unsupervised RL algorithms.
arXiv Detail & Related papers (2024-06-24T01:37:18Z) - Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning [17.239062061431646]
This paper studies reward-agnostic exploration in reinforcement learning (RL)
Consider a finite-horizon inhomogeneous decision process with $S$ states, $A$ actions, and a horizon length $H$.
Our algorithm is able to yield $varepsilon$ accuracy for arbitrarily many reward functions.
arXiv Detail & Related papers (2023-04-14T17:46:49Z) - Provably Efficient Offline Reinforcement Learning with Trajectory-Wise
Reward [66.81579829897392]
We propose a novel offline reinforcement learning algorithm called Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED)
PARTED decomposes the trajectory return into per-step proxy rewards via least-squares-based reward redistribution, and then performs pessimistic value based on the learned proxy reward.
To the best of our knowledge, PARTED is the first offline RL algorithm that is provably efficient in general MDP with trajectory-wise reward.
arXiv Detail & Related papers (2022-06-13T19:11:22Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - Online Sub-Sampling for Reinforcement Learning with General Function
Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm.
For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times.
In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z) - Active Finite Reward Automaton Inference and Reinforcement Learning
Using Queries and Counterexamples [31.31937554018045]
Deep reinforcement learning (RL) methods require intensive data from the exploration of the environment to achieve satisfactory performance.
We propose a framework that enables an RL agent to reason over its exploration process and distill high-level knowledge for effectively guiding its future explorations.
Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm.
arXiv Detail & Related papers (2020-06-28T21:13:08Z) - On Reward-Free Reinforcement Learning with Linear Function Approximation [144.4210285338698]
Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest.
In this work, we give both positive and negative results for reward-free RL with linear function approximation.
arXiv Detail & Related papers (2020-06-19T17:59:36Z) - Task-agnostic Exploration in Reinforcement Learning [35.403304641170386]
We present an efficient task-agnostic reinforcement learning algorithm, textscUCBZero.
It finds $epsilon$-optimal policies for $N$ arbitrary tasks after at most $tilde O(log(N)H5SA/epsilon2)$ exploration episodes.
We also provide an $Omega(log (N)H2SA/epsilon2)$ lower bound, showing that the $log$ dependency on $N$ is unavoidable.
arXiv Detail & Related papers (2020-06-16T20:23:41Z) - Reward-Free Exploration for Reinforcement Learning [82.3300753751066]
We propose a new "reward-free RL" framework to isolate the challenges of exploration.
We give an efficient algorithm that conducts $tildemathcalO(S2Amathrmpoly(H)/epsilon2)$ episodes of exploration.
We also give a nearly-matching $Omega(S2AH2/epsilon2)$ lower bound, demonstrating the near-optimality of our algorithm in this setting.
arXiv Detail & Related papers (2020-02-07T14:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.