Related papers: Adversarial Imitation Learning On Aggregated Data

Adversarial Imitation Learning On Aggregated Data

URL: http://arxiv.org/abs/2311.08568v1
Date: Tue, 14 Nov 2023 22:13:38 GMT
Title: Adversarial Imitation Learning On Aggregated Data
Authors: Pierre Le Pelletier de Woillemont and R\'emi Labory and Vincent Corruble
Abstract summary: Inverse Reinforcement Learning (IRL) learns an optimal policy, given some expert demonstrations, thus avoiding the need for the tedious process of specifying a suitable reward function. We propose an approach which removes these requirements through a dynamic, adaptive method called Adversarial Imitation Learning on Aggregated Data (AILAD) It learns conjointly both a non linear reward function and the associated optimal policy using an adversarial framework.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Inverse Reinforcement Learning (IRL) learns an optimal policy, given some expert demonstrations, thus avoiding the need for the tedious process of specifying a suitable reward function. However, current methods are constrained by at least one of the following requirements. The first one is the need to fully solve a forward Reinforcement Learning (RL) problem in the inner loop of the algorithm, which might be prohibitively expensive in many complex environments. The second one is the need for full trajectories from the experts, which might not be easily available. The third one is the assumption that the expert data is homogeneous rather than a collection from various experts or possibly alternative solutions to the same task. Such constraints make IRL approaches either not scalable or not usable on certain existing systems. In this work we propose an approach which removes these requirements through a dynamic, adaptive method called Adversarial Imitation Learning on Aggregated Data (AILAD). It learns conjointly both a non linear reward function and the associated optimal policy using an adversarial framework. The reward learner only uses aggregated data. Moreover, it generates diverse behaviors producing a distribution over the aggregated data matching that of the experts.

Related papers

Addressing Correlated Latent Exogenous Variables in Debiased Recommender Systems [3.082385853653964]
Recommendation systems (RS) aim to provide personalized content, but they face a challenge in unbiased learning due to selection bias.<n>This paper proposes a learning algorithm based on likelihood to learn a prediction model.
arXiv Detail & Related papers (2025-06-09T07:50:21Z)
Quantile-Optimal Policy Learning under Unmeasured Confounding [55.72891849926314]
We study quantile-optimal policy learning where the goal is to find a policy whose reward distribution has the largest $alpha$-quantile for some $alpha in (0, 1)$.<n>Such a problem suffers from three main challenges: (i) nonlinearity of the quantile objective as a functional of the reward distribution, (ii) unobserved confounding issue, and (iii) insufficient coverage of the offline dataset.
arXiv Detail & Related papers (2025-06-08T13:37:38Z)
Semi-pessimistic Reinforcement Learning [14.86779635383123]
We propose a semi-pessimistic RL method to leverage abundant unlabeled data.<n>It considerably simplifies the learning process, as it seeks a lower bound of the reward function.<n>It enjoys the guaranteed improvement when utilizing vast unlabeled data, but requires much less restrictive conditions.
arXiv Detail & Related papers (2025-05-25T06:47:36Z)
Offline Constrained Reinforcement Learning under Partial Data Coverage [18.449996575976993]
We study offline constrained reinforcement learning (RL) with general function approximation.<n>We propose an oracle-efficient primal-dual algorithm based on a linear programming (LP) formulation.
arXiv Detail & Related papers (2025-05-23T06:00:01Z)
Learning to Defer for Causal Discovery with Imperfect Experts [59.071731337922664]
We propose L2D-CD, a method for gauging the correctness of expert recommendations and optimally combining them with data-driven causal discovery results. We evaluate L2D-CD on the canonical T"ubingen pairs dataset and demonstrate its superior performance compared to both the causal discovery method and the expert used in isolation.
arXiv Detail & Related papers (2025-02-18T18:55:53Z)
Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization [65.8915778873691]
conditional distributions is a central problem in machine learning. We propose a new learning paradigm that integrates both paired and unpaired data. Our approach also connects intriguingly with inverse entropic optimal transport (OT)
arXiv Detail & Related papers (2024-10-03T16:12:59Z)
Offline Inverse RL: New Solution Concepts and Provably Efficient Algorithms [23.61332577985059]
Inverse reinforcement learning (IRL) aims to recover the reward function of an expert agent from demonstrations of behavior. This paper introduces a novel notion of feasible reward set capturing the opportunities and limitations of the offline setting.
arXiv Detail & Related papers (2024-02-23T15:49:46Z)
Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data [4.971690889257356]
We introduce an adaptation of the alternating minimization-descent scheme proposed by Collins and Nayer and Vaswani. We show that vanilla alternating-minimization descent fails catastrophically even for iid, but mildly non-isotropic data. Our analysis unifies and generalizes prior work, and provides a flexible framework for a wider range of applications.
arXiv Detail & Related papers (2023-08-08T17:56:20Z)
Task-Guided IRL in POMDPs that Scales [22.594913269327353]
In inverse linear reinforcement learning (IRL), a learning agent infers a reward function encoding the underlying task using demonstrations from experts. Most IRL techniques require the computationally forward problem -- computing an optimal policy given a reward function -- in POMDPs. We develop an algorithm that reduces the information while increasing the data efficiency.
arXiv Detail & Related papers (2022-12-30T21:08:57Z)
Model-based Offline Imitation Learning with Non-expert Data [7.615595533111191]
We propose a scalable model-based offline imitation learning algorithmic framework that leverages datasets collected by both suboptimal and optimal policies. We show that the proposed method textitalways outperforms Behavioral Cloning in the low data regime on simulated continuous control domains.
arXiv Detail & Related papers (2022-06-11T13:08:08Z)
Linear Speedup in Personalized Collaborative Learning [69.45124829480106]
Personalization in federated learning can improve the accuracy of a model for a user by trading off the model's bias. We formalize the personalized collaborative learning problem as optimization of a user's objective. We explore conditions under which we can optimally trade-off their bias for a reduction in variance.
arXiv Detail & Related papers (2021-11-10T22:12:52Z)
Offline Inverse Reinforcement Learning [24.316047317028147]
offline RL is to learn optimal policies when a fixed exploratory demonstrations data-set is available. Inspired by the success of IRL techniques in achieving state of the art imitation performances in online settings, we exploit GAN based data augmentation procedures to construct the first offline IRL algorithm.
arXiv Detail & Related papers (2021-06-09T13:44:06Z)
Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function. The goal is to find a policy that matches the expert's performance on some predefined set of cost functions. We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z)
Online Model Selection for Reinforcement Learning with Function Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret. We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z)
Learning while Respecting Privacy and Robustness to Distributional Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model. The objective is to endow the trained model with robustness against adversarially manipulated input data. Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z)
DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction [96.90215318875859]
We show that bootstrapping-based Q-learning algorithms do not necessarily benefit from corrective feedback. We propose a new algorithm, DisCor, which computes an approximation to this optimal distribution and uses it to re-weight the transitions used for training.
arXiv Detail & Related papers (2020-03-16T16:18:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.