KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human
Suboptimal Knowledge
- URL: http://arxiv.org/abs/2002.07418v2
- Date: Thu, 21 May 2020 07:02:41 GMT
- Title: KoGuN: Accelerating Deep Reinforcement Learning via Integrating Human
Suboptimal Knowledge
- Authors: Peng Zhang, Jianye Hao, Weixun Wang, Hongyao Tang, Yi Ma, Yihai Duan,
Yan Zheng
- Abstract summary: We propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning.
Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to fine-tune suboptimal prior knowledge.
- Score: 40.343858932413376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning agents usually learn from scratch, which requires a
large number of interactions with the environment. This is quite different from
the learning process of human. When faced with a new task, human naturally have
the common sense and use the prior knowledge to derive an initial policy and
guide the learning process afterwards. Although the prior knowledge may be not
fully applicable to the new task, the learning process is significantly sped up
since the initial policy ensures a quick-start of learning and intermediate
guidance allows to avoid unnecessary exploration. Taking this inspiration, we
propose knowledge guided policy network (KoGuN), a novel framework that
combines human prior suboptimal knowledge with reinforcement learning. Our
framework consists of a fuzzy rule controller to represent human knowledge and
a refine module to fine-tune suboptimal prior knowledge. The proposed framework
is end-to-end and can be combined with existing policy-based reinforcement
learning algorithm. We conduct experiments on both discrete and continuous
control tasks. The empirical results show that our approach, which combines
human suboptimal knowledge and RL, achieves significant improvement on learning
efficiency of flat RL algorithms, even with very low-performance human prior
knowledge.
Related papers
- KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination [11.203441390685201]
Zero-shot coordination (ZSC) remains a major challenge in the cooperative AI field.
We introduce Knowledge-driven Programmatic reinforcement learning for ZSC.
A significant challenge is the vast program search space, making it difficult to find high-performing programs efficiently.
arXiv Detail & Related papers (2024-08-08T09:43:54Z) - Hierarchically Structured Task-Agnostic Continual Learning [0.0]
We take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle.
We propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths.
Our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms.
arXiv Detail & Related papers (2022-11-14T19:53:15Z) - Anti-Retroactive Interference for Lifelong Learning [65.50683752919089]
We design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain.
It tackles the problem from two aspects: extracting knowledge and memorizing knowledge.
It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum.
arXiv Detail & Related papers (2022-08-27T09:27:36Z) - Teachable Reinforcement Learning via Advice Distillation [161.43457947665073]
We propose a new supervision paradigm for interactive learning based on "teachable" decision-making systems that learn from structured advice provided by an external teacher.
We show that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms.
arXiv Detail & Related papers (2022-03-19T03:22:57Z) - Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z) - Transferability in Deep Learning: A Survey [80.67296873915176]
The ability to acquire and reuse knowledge is known as transferability in deep learning.
We present this survey to connect different isolated areas in deep learning with their relation to transferability.
We implement a benchmark and an open-source library, enabling a fair evaluation of deep learning methods in terms of transferability.
arXiv Detail & Related papers (2022-01-15T15:03:17Z) - The Information Geometry of Unsupervised Reinforcement Learning [133.20816939521941]
Unsupervised skill discovery is a class of algorithms that learn a set of policies without access to a reward function.
We show that unsupervised skill discovery algorithms do not learn skills that are optimal for every possible reward function.
arXiv Detail & Related papers (2021-10-06T13:08:36Z) - Transferring Domain Knowledge with an Adviser in Continuous Tasks [0.0]
Reinforcement learning techniques are incapable of explicitly incorporating domain-specific knowledge into the learning process.
We adapt the Deep Deterministic Policy Gradient (DDPG) algorithm to incorporate an adviser.
Our experiments on OpenAi Gym benchmark tasks show that integrating domain knowledge through advisers expedites the learning and improves the policy towards better optima.
arXiv Detail & Related papers (2021-02-16T09:03:33Z) - Learning Transferable Concepts in Deep Reinforcement Learning [0.7161783472741748]
We show that learning discrete representations of sensory inputs can provide a high-level abstraction that is common across multiple tasks.
In particular, we show that it is possible to learn such representations by self-supervision, following an information theoretic approach.
Our method is able to learn concepts in locomotive and optimal control tasks that increase the sample efficiency in both known and unknown tasks.
arXiv Detail & Related papers (2020-05-16T04:45:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.