Related papers: Continuous Mean-Zero Disagreement-Regularized Imitation Learning (CMZ-DRIL)

Continuous Mean-Zero Disagreement-Regularized Imitation Learning (CMZ-DRIL)

URL: http://arxiv.org/abs/2403.01059v1
Date: Sat, 2 Mar 2024 01:40:37 GMT
Title: Continuous Mean-Zero Disagreement-Regularized Imitation Learning (CMZ-DRIL)
Authors: Noah Ford, Ryan W. Gardner, Austin Juhl, and Nathan Larson
Abstract summary: This paper presents a method called Continuous Mean-Zero Disagreement-Regularized Imitation Learning (CMZ-DRIL) CMZ-DRIL uses reinforcement learning to minimize uncertainty among an ensemble of agents trained to model the expert demonstrations. As demonstrated in a waypoint-navigation environment and in two MuJoCo environments, CMZ-DRIL can generate performant agents that behave more similarly to the expert.
Score: 1.0057319866872687
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Machine-learning paradigms such as imitation learning and reinforcement learning can generate highly performant agents in a variety of complex environments. However, commonly used methods require large quantities of data and/or a known reward function. This paper presents a method called Continuous Mean-Zero Disagreement-Regularized Imitation Learning (CMZ-DRIL) that employs a novel reward structure to improve the performance of imitation-learning agents that have access to only a handful of expert demonstrations. CMZ-DRIL uses reinforcement learning to minimize uncertainty among an ensemble of agents trained to model the expert demonstrations. This method does not use any environment-specific rewards, but creates a continuous and mean-zero reward function from the action disagreement of the agent ensemble. As demonstrated in a waypoint-navigation environment and in two MuJoCo environments, CMZ-DRIL can generate performant agents that behave more similarly to the expert than primary previous approaches in several key metrics.

Related papers

Mixture of Autoencoder Experts Guidance using Unlabeled and Incomplete Data for Exploration in Reinforcement Learning [1.607353805620917]
We propose a framework that can effectively utilize expert demonstrations, even when they are incomplete and imperfect.<n>We employ a Mixturecoder Autoen Experts to capture a diverse range of behaviors and missing information in demonstrations.
arXiv Detail & Related papers (2025-07-21T06:38:46Z)
Dynamic Mixture of Progressive Parameter-Efficient Expert Library for Lifelong Robot Learning [69.81148368677593]
A generalist agent must continuously learn and adapt throughout its lifetime, achieving efficient forward transfer while minimizing catastrophic forgetting.<n>Previous work has explored parameter-efficient fine-tuning for single-task adaptation, effectively steering a frozen pretrained model with a small number of parameters.<n>We propose Dynamic Mixture of Progressive Efficient Expert Library (DMPEL) for lifelong robot learning.<n>Our framework outperforms state-of-the-art lifelong learning methods in success rates across continual adaptation, while utilizing minimal trainable parameters and storage.
arXiv Detail & Related papers (2025-06-06T11:13:04Z)
CCL: Collaborative Curriculum Learning for Sparse-Reward Multi-Agent Reinforcement Learning via Co-evolutionary Task Evolution [4.0873807995771]
Sparse reward environments pose significant challenges in reinforcement learning, especially within multi-agent systems.<n>We propose Collaborative Multi-dimensional Course Learning (CCL), a novel curriculum learning framework that addresses this by (1) refining intermediate tasks for individual agents, (2) using a variational evolutionary algorithm to generate informative subtasks, and (3) co-evolving agents with their environment to enhance training stability.
arXiv Detail & Related papers (2025-05-08T04:23:47Z)
Attention-Augmented Inverse Reinforcement Learning with Graph Convolutions for Multi-Agent Task Allocation [0.29998889086656577]
Multi-agent task allocation (MATA) plays a vital role in cooperative multi-agent systems. Inverse reinforcement learning (IRL)-based framework is proposed to enhance reward function learning and task execution efficiency. Experiments validate the superiority of the proposed method over widely used multi-agent reinforcement learning (MARL) algorithms.
arXiv Detail & Related papers (2025-04-07T13:14:45Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Quality Diversity Imitation Learning [9.627530753815968]
We introduce the first generic framework for Quality Diversity Imitation Learning (QD-IL) Our framework integrates the principles of quality diversity with adversarial imitation learning (AIL) methods, and can potentially improve any inverse reinforcement learning (IRL) method. Our method even achieves 2x expert performance in the most challenging Humanoid environment.
arXiv Detail & Related papers (2024-10-08T15:49:33Z)
No Regrets: Investigating and Improving Regret Approximations for Curriculum Discovery [53.08822154199948]
Unsupervised Environment Design (UED) methods have gained recent attention as their adaptive curricula promise to enable agents to be robust to in- and out-of-distribution tasks. This work investigates how existing UED methods select training environments, focusing on task prioritisation metrics. We develop a method that directly trains on scenarios with high learnability.
arXiv Detail & Related papers (2024-08-27T14:31:54Z)
RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a novel trainer-student system that learns a dynamic reward function based on the student's performance and alignment with expert demonstrations. RILe enables better performance in complex settings where traditional methods falter, outperforming existing methods by 2x in complex simulated robot-locomotion tasks.
arXiv Detail & Related papers (2024-06-12T17:56:31Z)
ReIL: A Framework for Reinforced Intervention-based Imitation Learning [3.0846824529023387]
We introduce Reinforced Intervention-based Learning (ReIL), a framework consisting of a general intervention-based learning algorithm and a multi-task imitation learning model. Experimental results from real world mobile robot navigation challenges indicate that ReIL learns rapidly from sparse supervisor corrections without suffering deterioration in performance.
arXiv Detail & Related papers (2022-03-29T09:30:26Z)
Softmax with Regularization: Better Value Estimation in Multi-Agent Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning. We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline. We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z)
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD) We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
Demonstration-efficient Inverse Reinforcement Learning in Procedurally Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations. We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z)
Reward Machines for Cooperative Multi-Agent Reinforcement Learning [30.84689303706561]
In cooperative multi-agent reinforcement learning, a collection of agents learns to interact in a shared environment to achieve a common goal. We propose the use of reward machines (RM) -- Mealy machines used as structured representations of reward functions -- to encode the team's task. The proposed novel interpretation of RMs in the multi-agent setting explicitly encodes required teammate interdependencies, allowing the team-level task to be decomposed into sub-tasks for individual agents.
arXiv Detail & Related papers (2020-07-03T23:08:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.