Continuous Mean-Zero Disagreement-Regularized Imitation Learning
(CMZ-DRIL)
- URL: http://arxiv.org/abs/2403.01059v1
- Date: Sat, 2 Mar 2024 01:40:37 GMT
- Title: Continuous Mean-Zero Disagreement-Regularized Imitation Learning
(CMZ-DRIL)
- Authors: Noah Ford, Ryan W. Gardner, Austin Juhl, and Nathan Larson
- Abstract summary: This paper presents a method called Continuous Mean-Zero Disagreement-Regularized Imitation Learning (CMZ-DRIL)
CMZ-DRIL uses reinforcement learning to minimize uncertainty among an ensemble of agents trained to model the expert demonstrations.
As demonstrated in a waypoint-navigation environment and in two MuJoCo environments, CMZ-DRIL can generate performant agents that behave more similarly to the expert.
- Score: 1.0057319866872687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine-learning paradigms such as imitation learning and reinforcement
learning can generate highly performant agents in a variety of complex
environments. However, commonly used methods require large quantities of data
and/or a known reward function. This paper presents a method called Continuous
Mean-Zero Disagreement-Regularized Imitation Learning (CMZ-DRIL) that employs a
novel reward structure to improve the performance of imitation-learning agents
that have access to only a handful of expert demonstrations. CMZ-DRIL uses
reinforcement learning to minimize uncertainty among an ensemble of agents
trained to model the expert demonstrations. This method does not use any
environment-specific rewards, but creates a continuous and mean-zero reward
function from the action disagreement of the agent ensemble. As demonstrated in
a waypoint-navigation environment and in two MuJoCo environments, CMZ-DRIL can
generate performant agents that behave more similarly to the expert than
primary previous approaches in several key metrics.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Quality Diversity Imitation Learning [9.627530753815968]
We introduce the first generic framework for Quality Diversity Imitation Learning (QD-IL)
Our framework integrates the principles of quality diversity with adversarial imitation learning (AIL) methods, and can potentially improve any inverse reinforcement learning (IRL) method.
Our method even achieves 2x expert performance in the most challenging Humanoid environment.
arXiv Detail & Related papers (2024-10-08T15:49:33Z) - No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks.
This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics.
We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z) - RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a novel trainer-student system that learns a dynamic reward function based on the student's performance and alignment with expert demonstrations.
RILe enables better performance in complex settings where traditional methods falter, outperforming existing methods by 2x in complex simulated robot-locomotion tasks.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - ReIL: A Framework for Reinforced Intervention-based Imitation Learning [3.0846824529023387]
We introduce Reinforced Intervention-based Learning (ReIL), a framework consisting of a general intervention-based learning algorithm and a multi-task imitation learning model.
Experimental results from real world mobile robot navigation challenges indicate that ReIL learns rapidly from sparse supervisor corrections without suffering deterioration in performance.
arXiv Detail & Related papers (2022-03-29T09:30:26Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - Reward Machines for Cooperative Multi-Agent Reinforcement Learning [30.84689303706561]
In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal.
We propose the use of reward machines (RM) -- Mealy machines used as structured representations of reward functions -- to encode the team's task.
The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies, allowing the team-level task to be decomposed into sub-tasks for individual agents.
arXiv Detail & Related papers (2020-07-03T23:08:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.