Mimicking Better by Matching the Approximate Action Distribution
- URL: http://arxiv.org/abs/2306.09805v3
- Date: Tue, 22 Oct 2024 11:33:36 GMT
- Title: Mimicking Better by Matching the Approximate Action Distribution
- Authors: João A. Cândido Ramos, Lionel Blondé, Naoya Takeishi, Alexandros Kalousis,
- Abstract summary: We introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations.
We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods.
- Score: 48.95048003354255
- License:
- Abstract: In this paper, we introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations. MAAD utilizes a surrogate reward signal, which can be derived from various sources such as adversarial games, trajectory matching objectives, or optimal transport criteria. To compensate for the non-availability of expert actions, we rely on an inverse dynamics model that infers plausible actions distribution given the expert's state-state transitions; we regularize the imitator's policy by aligning it to the inferred action distribution. MAAD leads to significantly improved sample efficiency and stability. We demonstrate its effectiveness in a number of MuJoCo environments, both int the OpenAI Gym and the DeepMind Control Suite. We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods. Remarkably, MAAD often stands out as the sole method capable of attaining expert performance levels, underscoring its simplicity and efficacy.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations.
We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games.
Our findings underscore the multifaceted approach required for MARLHF, paving the way for effective preference-based multi-agent systems.
arXiv Detail & Related papers (2024-09-01T13:14:41Z) - POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning [17.644279061872442]
Value function factorization methods are commonly used in cooperative multi-agent reinforcement learning.
We propose the Potentially Optimal Joint Actions Weighted Qmix (POWQmix) algorithm, which recognizes the potentially optimal joint actions and assigns higher weights to the corresponding losses during training.
Experiments in matrix games, difficulty-enhanced predator-prey, and StarCraft II Multi-Agent Challenge environments demonstrate that our algorithm outperforms the state-of-the-art value-based multi-agent reinforcement learning methods.
arXiv Detail & Related papers (2024-05-13T03:27:35Z) - On the Connection between Invariant Learning and Adversarial Training
for Out-of-Distribution Generalization [14.233038052654484]
deep learning models rely on spurious features, which catastrophically fail when generalized to out-of-distribution (OOD) data.
Recent work shows that Invariant Risk Minimization (IRM) is only effective for a certain type of distribution shift while it fails for other cases.
We propose Domainwise Adversarial Training ( DAT), an AT-inspired method for alleviating distribution shift by domain-specific perturbations.
arXiv Detail & Related papers (2022-12-18T13:13:44Z) - Imitation Learning by State-Only Distribution Matching [2.580765958706854]
Imitation Learning from observation describes policy learning in a similar way to human learning.
We propose a non-adversarial learning-from-observations approach, together with an interpretable convergence and performance metric.
arXiv Detail & Related papers (2022-02-09T08:38:50Z) - A Deep Reinforcement Learning Approach to Marginalized Importance
Sampling with the Successor Representation [61.740187363451746]
Marginalized importance sampling (MIS) measures the density ratio between the state-action occupancy of a target policy and that of a sampling distribution.
We bridge the gap between MIS and deep reinforcement learning by observing that the density ratio can be computed from the successor representation of the target policy.
We evaluate the empirical performance of our approach on a variety of challenging Atari and MuJoCo environments.
arXiv Detail & Related papers (2021-06-12T20:21:38Z) - Off-Policy Imitation Learning from Observations [78.30794935265425]
Learning from Observations (LfO) is a practical reinforcement learning scenario from which many applications can benefit.
We propose a sample-efficient LfO approach that enables off-policy optimization in a principled manner.
Our approach is comparable with state-of-the-art locomotion in terms of both sample-efficiency and performance.
arXiv Detail & Related papers (2021-02-25T21:33:47Z) - Efficient Empowerment Estimation for Unsupervised Stabilization [75.32013242448151]
empowerment principle enables unsupervised stabilization of dynamical systems at upright positions.
We propose an alternative solution based on a trainable representation of a dynamical system as a Gaussian channel.
We show that our method has a lower sample complexity, is more stable in training, possesses the essential properties of the empowerment function, and allows estimation of empowerment from images.
arXiv Detail & Related papers (2020-07-14T21:10:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.