Augmented Q Imitation Learning (AQIL)
- URL: http://arxiv.org/abs/2004.00993v2
- Date: Sun, 5 Apr 2020 17:16:23 GMT
- Title: Augmented Q Imitation Learning (AQIL)
- Authors: Xiao Lei Zhang, Anish Agarwal
- Abstract summary: In imitation learning the machine learns by mimicking the behavior of an expert system whereas in reinforcement learning the machine learns via direct environment feedback.
This paper proposes Augmented Q-Imitation-Learning, a method by which deep reinforcement learning convergence can be accelerated.
- Score: 20.909770125018564
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The study of unsupervised learning can be generally divided into two
categories: imitation learning and reinforcement learning. In imitation
learning the machine learns by mimicking the behavior of an expert system
whereas in reinforcement learning the machine learns via direct environment
feedback. Traditional deep reinforcement learning takes a significant time
before the machine starts to converge to an optimal policy. This paper proposes
Augmented Q-Imitation-Learning, a method by which deep reinforcement learning
convergence can be accelerated by applying Q-imitation-learning as the initial
training process in traditional Deep Q-learning.
Related papers
- A Unified Framework for Continual Learning and Machine Unlearning [9.538733681436836]
Continual learning and machine unlearning are crucial challenges in machine learning, typically addressed separately.
We introduce a novel framework that jointly tackles both tasks by leveraging controlled knowledge distillation.
Our approach enables efficient learning with minimal forgetting and effective targeted unlearning.
arXiv Detail & Related papers (2024-08-21T06:49:59Z) - Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature.
We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate.
We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z) - FRAC-Q-Learning: A Reinforcement Learning with Boredom Avoidance Processes for Social Robots [0.0]
We propose a new reinforcement learning method specialized for the social robot, the FRAC-Q-learning, that can avoid user boredom.
The proposed algorithm consists of a forgetting process in addition to randomizing and categorizing processes.
The FRAC-Q-learning showed significantly higher trend of interest score, and indicated significantly harder to bore users compared to the traditional Q-learning.
arXiv Detail & Related papers (2023-11-26T15:11:17Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Rethinking Learning Dynamics in RL using Adversarial Networks [79.56118674435844]
We present a learning mechanism for reinforcement learning of closely related skills parameterized via a skill embedding space.
The main contribution of our work is to formulate an adversarial training regime for reinforcement learning with the help of entropy-regularized policy gradient formulation.
arXiv Detail & Related papers (2022-01-27T19:51:09Z) - Active Reinforcement Learning -- A Roadmap Towards Curious Classifier
Systems for Self-Adaptation [0.456877715768796]
Article aims to set up a research agenda towards what we call "active reinforcement learning" in intelligent systems.
Traditional approaches separate the learning problem and make isolated use of techniques from different field of machine learning.
arXiv Detail & Related papers (2022-01-11T13:50:26Z) - Rethinking Supervised Learning and Reinforcement Learning in
Task-Oriented Dialogue Systems [58.724629408229205]
We demonstrate how traditional supervised learning and a simulator-free adversarial learning method can be used to achieve performance comparable to state-of-the-art RL-based methods.
Our main goal is not to beat reinforcement learning with supervised learning, but to demonstrate the value of rethinking the role of reinforcement learning and supervised learning in optimizing task-oriented dialogue systems.
arXiv Detail & Related papers (2020-09-21T12:04:18Z) - Transfer Learning in Deep Reinforcement Learning: A Survey [64.36174156782333]
Reinforcement learning is a learning paradigm for solving sequential decision-making problems.
Recent years have witnessed remarkable progress in reinforcement learning upon the fast development of deep neural networks.
transfer learning has arisen to tackle various challenges faced by reinforcement learning.
arXiv Detail & Related papers (2020-09-16T18:38:54Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z) - A Novel Update Mechanism for Q-Networks Based On Extreme Learning
Machines [0.6445605125467573]
Extreme Q-Learning Machine (EQLM) is applied to a reinforcement learning problem in the same manner as gradient based updates.
We compare its performance to a typical Q-Network on the cart-pole task.
We show EQLM has similar long-term learning performance to a Q-Network.
arXiv Detail & Related papers (2020-06-04T16:16:13Z) - A new Potential-Based Reward Shaping for Reinforcement Learning Agent [0.0]
The proposed method extracts knowledge from episodes' cumulative rewards.
The results indicate an improvement in the learning process in both the single-task and the multi-task reinforcement learner agents.
arXiv Detail & Related papers (2019-02-17T10:34:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.