IOB: Integrating Optimization Transfer and Behavior Transfer for
Multi-Policy Reuse
- URL: http://arxiv.org/abs/2308.07351v1
- Date: Mon, 14 Aug 2023 09:22:35 GMT
- Title: IOB: Integrating Optimization Transfer and Behavior Transfer for
Multi-Policy Reuse
- Authors: Siyuan Li, Hao Li, Jin Zhang, Zhen Wang, Peng Liu, Chongjie Zhang
- Abstract summary: Reinforcement learning (RL) agents can transfer knowledge from source policies to a related target task.
Previous methods introduce additional components, such as hierarchical policies or estimations of source policies' value functions.
We propose a novel transfer RL method that selects the source policy without training extra components.
- Score: 50.90781542323258
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Humans have the ability to reuse previously learned policies to solve new
tasks quickly, and reinforcement learning (RL) agents can do the same by
transferring knowledge from source policies to a related target task. Transfer
RL methods can reshape the policy optimization objective (optimization
transfer) or influence the behavior policy (behavior transfer) using source
policies. However, selecting the appropriate source policy with limited samples
to guide target policy learning has been a challenge. Previous methods
introduce additional components, such as hierarchical policies or estimations
of source policies' value functions, which can lead to non-stationary policy
optimization or heavy sampling costs, diminishing transfer effectiveness. To
address this challenge, we propose a novel transfer RL method that selects the
source policy without training extra components. Our method utilizes the Q
function in the actor-critic framework to guide policy selection, choosing the
source policy with the largest one-step improvement over the current target
policy. We integrate optimization transfer and behavior transfer (IOB) by
regularizing the learned policy to mimic the guidance policy and combining them
as the behavior policy. This integration significantly enhances transfer
effectiveness, surpasses state-of-the-art transfer RL baselines in benchmark
tasks, and improves final performance and knowledge transferability in
continual learning scenarios. Additionally, we show that our optimization
transfer technique is guaranteed to improve target policy learning.
Related papers
- Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline
Reinforcement Learning [57.83919813698673]
Projected Off-Policy Q-Learning (POP-QL) is a novel actor-critic algorithm that simultaneously reweights off-policy samples and constrains the policy to prevent divergence and reduce value-approximation error.
In our experiments, POP-QL not only shows competitive performance on standard benchmarks, but also out-performs competing methods in tasks where the data-collection policy is significantly sub-optimal.
arXiv Detail & Related papers (2023-11-25T00:30:58Z) - Diversity for Contingency: Learning Diverse Behaviors for Efficient
Adaptation and Transfer [0.0]
We propose a simple method for discovering all possible solutions of a given task.
Unlike prior methods, our approach does not require learning additional models for novelty detection.
arXiv Detail & Related papers (2023-10-11T13:39:35Z) - Value Enhancement of Reinforcement Learning via Efficient and Robust
Trust Region Optimization [14.028916306297928]
Reinforcement learning (RL) is a powerful machine learning technique that enables an intelligent agent to learn an optimal policy.
We propose a novel value enhancement method to improve the performance of a given initial policy computed by existing state-of-the-art RL algorithms.
arXiv Detail & Related papers (2023-01-05T18:43:40Z) - Offline Reinforcement Learning with Closed-Form Policy Improvement
Operators [88.54210578912554]
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning.
In this paper, we propose our closed-form policy improvement operators.
We empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
arXiv Detail & Related papers (2022-11-29T06:29:26Z) - CUP: Critic-Guided Policy Reuse [37.12379523150601]
Critic-gUided Policy reuse (CUP) is a policy reuse algorithm that avoids training any extra components and efficiently reuses source policies.
CUP selects the source policy that has the largest one-step improvement over the current target policy, and forms a guidance policy.
Empirical results demonstrate that CUP achieves efficient transfer and significantly outperforms baseline algorithms.
arXiv Detail & Related papers (2022-10-15T00:53:03Z) - Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios.
We propose to leverage latent-variable policies that can represent a broader class of policy distributions.
Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z) - Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy.
We propose an offline RL method that never needs to evaluate actions outside of the dataset.
This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z) - Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL)
Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it.
Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.