Inapplicable Actions Learning for Knowledge Transfer in Reinforcement
Learning
- URL: http://arxiv.org/abs/2211.15589v3
- Date: Thu, 11 May 2023 20:20:05 GMT
- Title: Inapplicable Actions Learning for Knowledge Transfer in Reinforcement
Learning
- Authors: Leo Ardon, Alberto Pozanco, Daniel Borrajo, Sumitra Ganesh
- Abstract summary: We show that learning inapplicable actions greatly improves the sample efficiency of RL algorithms.
Thanks to the transferability of the knowledge acquired, it can be reused in other tasks and domains to make the learning process more efficient.
- Score: 3.194414753332705
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning (RL) algorithms are known to scale poorly to
environments with many available actions, requiring numerous samples to learn
an optimal policy. The traditional approach of considering the same fixed
action space in every possible state implies that the agent must understand,
while also learning to maximize its reward, to ignore irrelevant actions such
as $\textit{inapplicable actions}$ (i.e. actions that have no effect on the
environment when performed in a given state). Knowing this information can help
reduce the sample complexity of RL algorithms by masking the inapplicable
actions from the policy distribution to only explore actions relevant to
finding an optimal policy. While this technique has been formalized for quite
some time within the Automated Planning community with the concept of
precondition in the STRIPS language, RL algorithms have never formally taken
advantage of this information to prune the search space to explore. This is
typically done in an ad-hoc manner with hand-crafted domain logic added to the
RL algorithm. In this paper, we propose a more systematic approach to introduce
this knowledge into the algorithm. We (i) standardize the way knowledge can be
manually specified to the agent; and (ii) present a new framework to
autonomously learn the partial action model encapsulating the precondition of
an action jointly with the policy. We show experimentally that learning
inapplicable actions greatly improves the sample efficiency of the algorithm by
providing a reliable signal to mask out irrelevant actions. Moreover, we
demonstrate that thanks to the transferability of the knowledge acquired, it
can be reused in other tasks and domains to make the learning process more
efficient.
Related papers
- How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Learning impartial policies for sequential counterfactual explanations
using Deep Reinforcement Learning [0.0]
Recently Reinforcement Learning (RL) methods have been proposed that seek to learn policies for discovering SCFs, thereby enhancing scalability.
In this work, we identify shortcomings in existing methods that can result in policies with undesired properties, such as a bias towards specific actions.
We propose to use the output probabilities of the classifier to create a more informative reward, to mitigate this effect.
arXiv Detail & Related papers (2023-11-01T13:50:47Z) - Large Language Models can Implement Policy Iteration [18.424558160071808]
In-Context Policy Iteration is an algorithm for performing Reinforcement Learning (RL), in-context, using foundation models.
ICPI learns to perform RL tasks without expert demonstrations or gradients.
ICPI iteratively updates the contents of the prompt from which it derives its policy through trial-and-error interaction with an RL environment.
arXiv Detail & Related papers (2022-10-07T21:18:22Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - The Information Geometry of Unsupervised Reinforcement Learning [133.20816939521941]
Unsupervised skill discovery is a class of algorithms that learn a set of policies without access to a reward function.
We show that unsupervised skill discovery algorithms do not learn skills that are optimal for every possible reward function.
arXiv Detail & Related papers (2021-10-06T13:08:36Z) - Deep RL With Information Constrained Policies: Generalization in
Continuous Control [21.46148507577606]
We show that a natural constraint on information flow might confer onto artificial agents in continuous control tasks.
We implement a novel Capacity-Limited Actor-Critic (CLAC) algorithm.
Our experiments show that compared to alternative approaches, CLAC offers improvements in generalization between training and modified test environments.
arXiv Detail & Related papers (2020-10-09T15:42:21Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z) - Zeroth-Order Supervised Policy Improvement [94.0748002906652]
Policy gradient (PG) algorithms have been widely used in reinforcement learning (RL)
We propose Zeroth-Order Supervised Policy Improvement (ZOSPI)
ZOSPI exploits the estimated value function $Q$ globally while preserving the local exploitation of the PG methods.
arXiv Detail & Related papers (2020-06-11T16:49:23Z) - Off-Policy Adversarial Inverse Reinforcement Learning [0.0]
Adversarial Imitation Learning (AIL) is a class of algorithms in Reinforcement learning (RL)
We propose an Off-Policy Adversarial Inverse Reinforcement Learning (Off-policy-AIRL) algorithm which is sample efficient as well as gives good imitation performance.
arXiv Detail & Related papers (2020-05-03T16:51:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.