Active Finite Reward Automaton Inference and Reinforcement Learning
Using Queries and Counterexamples
- URL: http://arxiv.org/abs/2006.15714v4
- Date: Sat, 3 Jul 2021 01:51:29 GMT
- Title: Active Finite Reward Automaton Inference and Reinforcement Learning
Using Queries and Counterexamples
- Authors: Zhe Xu, Bo Wu, Aditya Ojha, Daniel Neider, Ufuk Topcu
- Abstract summary: Deep reinforcement learning (RL) methods require intensive data from the exploration of the environment to achieve satisfactory performance.
We propose a framework that enables an RL agent to reason over its exploration process and distill high-level knowledge for effectively guiding its future explorations.
Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm.
- Score: 31.31937554018045
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the fact that deep reinforcement learning (RL) has surpassed
human-level performances in various tasks, it still has several fundamental
challenges. First, most RL methods require intensive data from the exploration
of the environment to achieve satisfactory performance. Second, the use of
neural networks in RL renders it hard to interpret the internals of the system
in a way that humans can understand. To address these two challenges, we
propose a framework that enables an RL agent to reason over its exploration
process and distill high-level knowledge for effectively guiding its future
explorations. Specifically, we propose a novel RL algorithm that learns
high-level knowledge in the form of a finite reward automaton by using the L*
learning algorithm. We prove that in episodic RL, a finite reward automaton can
express any non-Markovian bounded reward functions with finitely many reward
values and approximate any non-Markovian bounded reward function (with
infinitely many reward values) with arbitrary precision. We also provide a
lower bound for the episode length such that the proposed RL approach almost
surely converges to an optimal policy in the limit. We test this approach on
two RL environments with non-Markovian reward functions, choosing a variety of
tasks with increasing complexity for each environment. We compare our algorithm
with the state-of-the-art RL algorithms for non-Markovian reward functions,
such as Joint Inference of Reward machines and Policies for RL (JIRP), Learning
Reward Machine (LRM), and Proximal Policy Optimization (PPO2). Our results show
that our algorithm converges to an optimal policy faster than other baseline
methods.
Related papers
- MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards.
We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration.
We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z) - Adaptive Reward Design for Reinforcement Learning in Complex Robotic Tasks [2.3031174164121127]
We propose a suite of reward functions that incentivize an RL agent to make measurable progress on tasks specified by formulas.
We develop an adaptive reward shaping approach that dynamically updates these reward functions during the learning process.
Experimental results on a range of RL-based robotic tasks demonstrate that the proposed approach is compatible with various RL algorithms.
arXiv Detail & Related papers (2024-12-14T18:04:18Z) - Uncertainty-Aware Reward-Free Exploration with General Function Approximation [69.27868448449755]
In this paper, we propose a reward-free reinforcement learning algorithm called alg.
The key idea behind our algorithm is an uncertainty-aware intrinsic reward for exploring the environment.
Experiment results show that GFA-RFE outperforms or is comparable to the performance of state-of-the-art unsupervised RL algorithms.
arXiv Detail & Related papers (2024-06-24T01:37:18Z) - ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - To the Max: Reinventing Reward in Reinforcement Learning [1.5498250598583487]
In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance.
We introduce textitmax-reward RL, where an agent optimize the maximum rather than the cumulative reward.
In experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium-Robotics.
arXiv Detail & Related papers (2024-02-02T12:29:18Z) - Deep Black-Box Reinforcement Learning with Movement Primitives [15.184283143878488]
We present a new algorithm for deep reinforcement learning (RL)
It is based on differentiable trust region layers, a successful on-policy deep RL algorithm.
We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks.
arXiv Detail & Related papers (2022-10-18T06:34:52Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - On Reward-Free RL with Kernel and Neural Function Approximations:
Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function.
We tackle this problem under the context of function approximation, leveraging powerful function approximators.
We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z) - On Reward-Free Reinforcement Learning with Linear Function Approximation [144.4210285338698]
Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest.
In this work, we give both positive and negative results for reward-free RL with linear function approximation.
arXiv Detail & Related papers (2020-06-19T17:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.