Flexible Attention-Based Multi-Policy Fusion for Efficient Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2210.03729v2
- Date: Mon, 9 Oct 2023 18:17:46 GMT
- Title: Flexible Attention-Based Multi-Policy Fusion for Efficient Deep
Reinforcement Learning
- Authors: Zih-Yun Chiu, Yi-Lin Tuan, William Yang Wang, Michael C. Yip
- Abstract summary: Reinforcement learning (RL) agents have long sought to approach the efficiency of human learning.
Prior studies in RL have incorporated external knowledge policies to help agents improve sample efficiency.
We present Knowledge-Grounded RL (KGRL), an RL paradigm fusing multiple knowledge policies and aiming for human-like efficiency and flexibility.
- Score: 78.31888150539258
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning (RL) agents have long sought to approach the
efficiency of human learning. Humans are great observers who can learn by
aggregating external knowledge from various sources, including observations
from others' policies of attempting a task. Prior studies in RL have
incorporated external knowledge policies to help agents improve sample
efficiency. However, it remains non-trivial to perform arbitrary combinations
and replacements of those policies, an essential feature for generalization and
transferability. In this work, we present Knowledge-Grounded RL (KGRL), an RL
paradigm fusing multiple knowledge policies and aiming for human-like
efficiency and flexibility. We propose a new actor architecture for KGRL,
Knowledge-Inclusive Attention Network (KIAN), which allows free knowledge
rearrangement due to embedding-based attentive action prediction. KIAN also
addresses entropy imbalance, a problem arising in maximum entropy KGRL that
hinders an agent from efficiently exploring the environment, through a new
design of policy distributions. The experimental results demonstrate that KIAN
outperforms alternative methods incorporating external knowledge policies and
achieves efficient and flexible learning. Our implementation is available at
https://github.com/Pascalson/KGRL.git
Related papers
- How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - Hybrid Inverse Reinforcement Learning [34.793570631021005]
inverse reinforcement learning approach to imitation learning is a double-edged sword.
We propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration.
We derive both model-free and model-based hybrid inverse RL algorithms with strong policy performance guarantees.
arXiv Detail & Related papers (2024-02-13T23:29:09Z) - Renaissance Robot: Optimal Transport Policy Fusion for Learning Diverse
Skills [28.39150937658635]
We propose a post-hoc technique for policy fusion using Optimal Transport theory.
This provides an improved weights initialisation of the neural network policy for learning new tasks.
Our results show that specialised knowledge can be unified into a "Renaissance agent", allowing for quicker learning of new skills.
arXiv Detail & Related papers (2022-07-03T08:15:41Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - A novel policy for pre-trained Deep Reinforcement Learning for Speech
Emotion Recognition [8.175197257598697]
Reinforcement Learning (RL) is a semi-supervised learning paradigm which an agent learns by interacting with an environment.
Deep RL has gained tremendous success in gaming - such as AlphaGo, but its potential have rarely been explored for challenging tasks like Speech Emotion Recognition (SER)
In this paper, we introduce a novel policy - "Zeta policy" which is tailored for SER and apply Pre-training in deep RL to achieve faster learning rate.
arXiv Detail & Related papers (2021-01-04T02:13:26Z) - Useful Policy Invariant Shaping from Arbitrary Advice [24.59807772487328]
A major challenge of RL research is to discover how to learn with less data.
Potential-based reward shaping (PBRS) holds promise, but it is limited by the need for a well-defined potential function.
The recently introduced dynamic potential based advice (DPBA) method tackles this challenge by admitting arbitrary advice from a human or other agent.
arXiv Detail & Related papers (2020-11-02T20:29:09Z) - Deep RL With Information Constrained Policies: Generalization in
Continuous Control [21.46148507577606]
We show that a natural constraint on information flow might confer onto artificial agents in continuous control tasks.
We implement a novel Capacity-Limited Actor-Critic (CLAC) algorithm.
Our experiments show that compared to alternative approaches, CLAC offers improvements in generalization between training and modified test environments.
arXiv Detail & Related papers (2020-10-09T15:42:21Z) - Discovering Reinforcement Learning Algorithms [53.72358280495428]
Reinforcement learning algorithms update an agent's parameters according to one of several possible rules.
This paper introduces a new meta-learning approach that discovers an entire update rule.
It includes both 'what to predict' (e.g. value functions) and 'how to learn from it' by interacting with a set of environments.
arXiv Detail & Related papers (2020-07-17T07:38:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.