Learning Routines for Effective Off-Policy Reinforcement Learning
- URL: http://arxiv.org/abs/2106.02943v1
- Date: Sat, 5 Jun 2021 18:41:57 GMT
- Title: Learning Routines for Effective Off-Policy Reinforcement Learning
- Authors: Edoardo Cetin, Oya Celiktutan
- Abstract summary: We propose a novel framework for reinforcement learning that effectively lifts such constraints.
Within our framework, agents learn effective behavior over a routine space.
We show that the resulting agents obtain relevant performance improvements while requiring fewer interactions with the environment per episode.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The performance of reinforcement learning depends upon designing an
appropriate action space, where the effect of each action is measurable, yet,
granular enough to permit flexible behavior. So far, this process involved
non-trivial user choices in terms of the available actions and their execution
frequency. We propose a novel framework for reinforcement learning that
effectively lifts such constraints. Within our framework, agents learn
effective behavior over a routine space: a new, higher-level action space,
where each routine represents a set of 'equivalent' sequences of granular
actions with arbitrary length. Our routine space is learned end-to-end to
facilitate the accomplishment of underlying off-policy reinforcement learning
objectives. We apply our framework to two state-of-the-art off-policy
algorithms and show that the resulting agents obtain relevant performance
improvements while requiring fewer interactions with the environment per
episode, improving computational efficiency.
Related papers
- Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking [7.590209768166108]
We introduce three continuous action masking methods for mapping the action space to the state-dependent set of relevant actions.
Our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications.
arXiv Detail & Related papers (2024-06-06T02:55:16Z) - Discovering Temporally-Aware Reinforcement Learning Algorithms [42.016150906831776]
We propose a simple augmentation to two existing objective discovery approaches.
We find that commonly used meta-gradient approaches fail to discover adaptive objective functions.
arXiv Detail & Related papers (2024-02-08T17:07:42Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Action Pick-up in Dynamic Action Space Reinforcement Learning [6.15205100319133]
We propose an intelligent Action Pick-up (AP) algorithm to autonomously choose valuable actions that are most likely to boost performance from a set of new actions.
In this paper, we first theoretically analyze and find that a prior optimal policy plays an important role in action pick-up by providing useful knowledge and experience.
We then design two different AP methods: frequency-based global method and state clustering-based local method, based on the prior optimal policy.
arXiv Detail & Related papers (2023-04-03T10:55:16Z) - Basis for Intentions: Efficient Inverse Reinforcement Learning using
Past Experience [89.30876995059168]
inverse reinforcement learning (IRL) -- inferring the reward function of an agent from observing its behavior.
This paper addresses the problem of IRL -- inferring the reward function of an agent from observing its behavior.
arXiv Detail & Related papers (2022-08-09T17:29:49Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon
Reasoning [120.38381203153159]
Reinforcement learning can train policies that effectively perform complex tasks.
For long-horizon tasks, the performance of these methods degrades with horizon, often necessitating reasoning over and composing lower-level skills.
We propose Value Function Spaces: a simple approach that produces such a representation by using the value functions corresponding to each lower-level skill.
arXiv Detail & Related papers (2021-11-04T22:46:16Z) - Generative Adversarial Reward Learning for Generalized Behavior Tendency
Inference [71.11416263370823]
We propose a generative inverse reinforcement learning for user behavioral preference modelling.
Our model can automatically learn the rewards from user's actions based on discriminative actor-critic network and Wasserstein GAN.
arXiv Detail & Related papers (2021-05-03T13:14:25Z) - Constrained-Space Optimization and Reinforcement Learning for Complex
Tasks [42.648636742651185]
Learning from Demonstration is increasingly used for transferring operator manipulation skills to robots.
This paper presents a constrained-space optimization and reinforcement learning scheme for managing complex tasks.
arXiv Detail & Related papers (2020-04-01T21:50:11Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.