Related papers: Active Finite Reward Automaton Inference and Reinforcement Learning Using Queries and Counterexamples

Active Finite Reward Automaton Inference and Reinforcement Learning Using Queries and Counterexamples

URL: http://arxiv.org/abs/2006.15714v4
Date: Sat, 3 Jul 2021 01:51:29 GMT
Title: Active Finite Reward Automaton Inference and Reinforcement Learning Using Queries and Counterexamples
Authors: Zhe Xu, Bo Wu, Aditya Ojha, Daniel Neider, Ufuk Topcu
Abstract summary: Deep reinforcement learning (RL) methods require intensive data from the exploration of the environment to achieve satisfactory performance. We propose a framework that enables an RL agent to reason over its exploration process and distill high-level knowledge for effectively guiding its future explorations. Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm.
Score: 31.31937554018045
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite the fact that deep reinforcement learning (RL) has surpassed human-level performances in various tasks, it still has several fundamental challenges. First, most RL methods require intensive data from the exploration of the environment to achieve satisfactory performance. Second, the use of neural networks in RL renders it hard to interpret the internals of the system in a way that humans can understand. To address these two challenges, we propose a framework that enables an RL agent to reason over its exploration process and distill high-level knowledge for effectively guiding its future explorations. Specifically, we propose a novel RL algorithm that learns high-level knowledge in the form of a finite reward automaton by using the L* learning algorithm. We prove that in episodic RL, a finite reward automaton can express any non-Markovian bounded reward functions with finitely many reward values and approximate any non-Markovian bounded reward function (with infinitely many reward values) with arbitrary precision. We also provide a lower bound for the episode length such that the proposed RL approach almost surely converges to an optimal policy in the limit. We test this approach on two RL environments with non-Markovian reward functions, choosing a variety of tasks with increasing complexity for each environment. We compare our algorithm with the state-of-the-art RL algorithms for non-Markovian reward functions, such as Joint Inference of Reward machines and Policies for RL (JIRP), Learning Reward Machine (LRM), and Proximal Policy Optimization (PPO2). Our results show that our algorithm converges to an optimal policy faster than other baseline methods.

Related papers

On the Importance of Reward Design in Reinforcement Learning-based Dynamic Algorithm Configuration: A Case Study on OneMax with (1+($λ$,$λ$))-GA [7.924445204088514]
We propose the application of a reward shaping mechanism to facilitate enhanced exploration of the environment by the RL agent.<n>Our work demonstrates the ability of RL in dynamically configuring the $(lambda,lambda)$-GA, but also confirms the advantages of reward shaping in the scalability of RL agents.
arXiv Detail & Related papers (2025-02-27T16:53:28Z)
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards. We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration. We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z)
Adaptive Reward Design for Reinforcement Learning in Complex Robotic Tasks [2.3031174164121127]
We propose a suite of reward functions that incentivize an RL agent to make measurable progress on tasks specified by formulas. We develop an adaptive reward shaping approach that dynamically updates these reward functions during the learning process. Experimental results on a range of RL-based robotic tasks demonstrate that the proposed approach is compatible with various RL algorithms.
arXiv Detail & Related papers (2024-12-14T18:04:18Z)
Stage-Wise Reward Shaping for Acrobatic Robots: A Constrained Multi-Objective Reinforcement Learning Approach [12.132416927711036]
We introduce an RL method aimed at simplifying the reward-shaping process through intuitive strategies. We define multiple reward and cost functions within a constrained multi-objective RL (CMORL) framework. For tasks involving sequential complex movements, we segment the task into distinct stages and define multiple rewards and costs for each stage.
arXiv Detail & Related papers (2024-09-24T05:25:24Z)
Uncertainty-Aware Reward-Free Exploration with General Function Approximation [69.27868448449755]
In this paper, we propose a reward-free reinforcement learning algorithm called alg. The key idea behind our algorithm is an uncertainty-aware intrinsic reward for exploring the environment. Experiment results show that GFA-RFE outperforms or is comparable to the performance of state-of-the-art unsupervised RL algorithms.
arXiv Detail & Related papers (2024-06-24T01:37:18Z)
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models. Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel. Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z)
To the Max: Reinventing Reward in Reinforcement Learning [1.5498250598583487]
In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. We introduce textitmax-reward RL, where an agent optimize the maximum rather than the cumulative reward. In experiments, we study the performance of max-reward RL algorithms in two goal-reaching environments from Gymnasium-Robotics.
arXiv Detail & Related papers (2024-02-02T12:29:18Z)
Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning [26.067411894141863]
An appropriate reward function is of paramount importance in specifying a task in reinforcement learning (RL) Human-in-the-loop (HiL) RL allows humans to communicate complex goals to the RL agent by providing various types of feedback. We provide an active-learning-based RL algorithm that first explores the environment without specifying a reward function.
arXiv Detail & Related papers (2023-04-18T12:36:09Z)
Deep Black-Box Reinforcement Learning with Movement Primitives [15.184283143878488]
We present a new algorithm for deep reinforcement learning (RL) It is based on differentiable trust region layers, a successful on-policy deep RL algorithm. We compare our ERL algorithm to state-of-the-art step-based algorithms in many complex simulated robotic control tasks.
arXiv Detail & Related papers (2022-10-18T06:34:52Z)
Reward Uncertainty for Exploration in Preference-based Reinforcement Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms. Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward. Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z)
Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy. In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks. We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z)
On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function. We tackle this problem under the context of function approximation, leveraging powerful function approximators. We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z)
On Reward-Free Reinforcement Learning with Linear Function Approximation [144.4210285338698]
Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch RL setting and the setting where there are many reward functions of interest. In this work, we give both positive and negative results for reward-free RL with linear function approximation.
arXiv Detail & Related papers (2020-06-19T17:59:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.