Towards Task-Prioritized Policy Composition
- URL: http://arxiv.org/abs/2209.09536v1
- Date: Tue, 20 Sep 2022 08:08:04 GMT
- Title: Towards Task-Prioritized Policy Composition
- Authors: Finn Rietz, Erik Schaffernicht, Todor Stoyanov, Johannes A. Stork
- Abstract summary: We propose a novel task-prioritized composition framework for Reinforcement Learning.
Our framework has the potential to facilitate knowledge transfer and modular design while greatly increasing data efficiency and data reuse for Reinforcement Learning agents.
Unlike null-space control, our approach allows learning globally optimal policies for the compound task by online learning in the indifference-space of higher-level policies after initial compound policy construction.
- Score: 10.477909792349823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Combining learned policies in a prioritized, ordered manner is desirable
because it allows for modular design and facilitates data reuse through
knowledge transfer. In control theory, prioritized composition is realized by
null-space control, where low-priority control actions are projected into the
null-space of high-priority control actions. Such a method is currently
unavailable for Reinforcement Learning. We propose a novel, task-prioritized
composition framework for Reinforcement Learning, which involves a novel
concept: The indifferent-space of Reinforcement Learning policies. Our
framework has the potential to facilitate knowledge transfer and modular design
while greatly increasing data efficiency and data reuse for Reinforcement
Learning agents. Further, our approach can ensure high-priority constraint
satisfaction, which makes it promising for learning in safety-critical domains
like robotics. Unlike null-space control, our approach allows learning globally
optimal policies for the compound task by online learning in the
indifference-space of higher-level policies after initial compound policy
construction.
Related papers
- Diffusion Guidance Is a Controllable Policy Improvement Operator [98.11511661904618]
CFGRL is trained with the simplicity of supervised learning, yet can further improve on the policies in the data.<n>On offline RL tasks, we observe a reliable trend -- increased guidance weighting leads to increased performance.
arXiv Detail & Related papers (2025-05-29T14:06:50Z) - Iterative Batch Reinforcement Learning via Safe Diversified Model-based Policy Search [2.0072624123275533]
Batch reinforcement learning enables policy learning without direct interaction with the environment during training.
This approach is well-suited for high-risk and cost-intensive applications, such as industrial control.
We present an algorithmic methodology for iterative batch reinforcement learning based on ensemble-based model-based policy search.
arXiv Detail & Related papers (2024-11-14T11:10:36Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients [54.98496284653234]
We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions.
We solve this problem by introducing a regularizer based on the mutual information between the sensitive state and the actions.
We develop a model-based estimator for optimization of privacy-constrained policies.
arXiv Detail & Related papers (2020-12-30T03:22:35Z) - Accelerating Safe Reinforcement Learning with Constraint-mismatched
Policies [34.555500347840805]
We consider the problem of reinforcement learning when provided with a baseline control policy and a set of constraints that the learner must satisfy.
We propose an iterative policy optimization algorithm that alternates between maximizing expected return on the task, minimizing distance to the baseline policy, and projecting the policy onto the constraint-satisfying set.
Our algorithm consistently outperforms several state-of-the-art baselines, achieving 10 times fewer constraint violations and 40% higher reward on average.
arXiv Detail & Related papers (2020-06-20T20:20:47Z) - Cautious Reinforcement Learning with Logical Constraints [78.96597639789279]
An adaptive safe padding forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process.
Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm.
arXiv Detail & Related papers (2020-02-26T00:01:08Z) - Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL)
Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it.
Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z) - First Order Constrained Optimization in Policy Space [19.00289722198614]
We propose a novel approach called First Order Constrained Optimization in Policy Space (FOCOPS)
FOCOPS maximizes an agent's overall reward while ensuring the agent satisfies a set of cost constraints.
We provide empirical evidence that our simple approach achieves better performance on a set of constrained robotics locomotive tasks.
arXiv Detail & Related papers (2020-02-16T05:07:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.