SPRINQL: Sub-optimal Demonstrations driven Offline Imitation Learning
- URL: http://arxiv.org/abs/2402.13147v3
- Date: Thu, 10 Oct 2024 19:27:40 GMT
- Title: SPRINQL: Sub-optimal Demonstrations driven Offline Imitation Learning
- Authors: Huy Hoang, Tien Mai, Pradeep Varakantham,
- Abstract summary: offline imitation learning (IL) aims to mimic an expert's behavior using demonstrations without any interaction with the environment.
We propose an offline IL approach that leverages the larger set of sub-optimal demonstrations while effectively mimicking expert trajectories.
Our approach, which is based on inverse soft-Q learning, learns from both expert and sub-optimal demonstrations.
- Score: 11.666700714916065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We focus on offline imitation learning (IL), which aims to mimic an expert's behavior using demonstrations without any interaction with the environment. One of the main challenges in offline IL is the limited support of expert demonstrations, which typically cover only a small fraction of the state-action space. While it may not be feasible to obtain numerous expert demonstrations, it is often possible to gather a larger set of sub-optimal demonstrations. For example, in treatment optimization problems, there are varying levels of doctor treatments available for different chronic conditions. These range from treatment specialists and experienced general practitioners to less experienced general practitioners. Similarly, when robots are trained to imitate humans in routine tasks, they might learn from individuals with different levels of expertise and efficiency. In this paper, we propose an offline IL approach that leverages the larger set of sub-optimal demonstrations while effectively mimicking expert trajectories. Existing offline IL methods based on behavior cloning or distribution matching often face issues such as overfitting to the limited set of expert demonstrations or inadvertently imitating sub-optimal trajectories from the larger dataset. Our approach, which is based on inverse soft-Q learning, learns from both expert and sub-optimal demonstrations. It assigns higher importance (through learned weights) to aligning with expert demonstrations and lower importance to aligning with sub-optimal ones. A key contribution of our approach, called SPRINQL, is transforming the offline IL problem into a convex optimization over the space of Q functions. Through comprehensive experimental evaluations, we demonstrate that the SPRINQL algorithm achieves state-of-the-art (SOTA) performance on offline IL benchmarks. Code is available at https://github.com/hmhuy0/SPRINQL.
Related papers
- Latent Wasserstein Adversarial Imitation Learning [110.12916356445908]
Imitation Learning (IL) enables agents to mimic expert behavior by learning from demonstrations.<n>We propose Latent Wasserstein Adrial Imitation Learning (LWAIL), a novel adversarial imitation learning framework.<n>We show that our method outperforms prior Wasserstein-based IL methods and prior adversarial IL methods.
arXiv Detail & Related papers (2026-03-05T18:01:49Z) - Robust Offline Imitation Learning Through State-level Trajectory Stitching [37.281554320048755]
Imitation learning (IL) has proven effective for enabling robots to acquire visuomotor skills through expert demonstrations.
Recent advances in offline IL have incorporated suboptimal, unlabeled datasets into the training.
We propose a novel approach to enhance policy learning from mixed-quality offline datasets by leveraging task-relevant trajectory fragments and rich environmental dynamics.
arXiv Detail & Related papers (2025-03-28T15:28:36Z) - Imitation Learning from Suboptimal Demonstrations via Meta-Learning An Action Ranker [9.6508237676589]
A major bottleneck in imitation learning is the requirement of a large number of expert demonstrations.
We propose a novel approach named imitation learning via meta-learning an action ranker (ILMAR)
ILMAR implements weighted behavior cloning (weighted BC) on a limited set of expert demonstrations along with supplementary demonstrations.
arXiv Detail & Related papers (2024-12-28T16:06:44Z) - SORREL: Suboptimal-Demonstration-Guided Reinforcement Learning for Learning to Branch [33.90726769113883]
Mixed Linear Program (MILP) solvers are mostly built upon a Branch sampling-and-Bound (B&B) algorithm, where the efficiency of traditional solvers heavily depends on hand-crafteds for branching.
This paper proposes Sub-optimal-Demonstration-Reinforcement Learning (SORREL) for learning to branch.
arXiv Detail & Related papers (2024-12-20T03:48:53Z) - Adaptive Experimentation When You Can't Experiment [55.86593195947978]
This paper introduces the emphconfounded pure exploration transductive linear bandit (textttCPET-LB) problem.
Online services can employ a properly randomized encouragement that incentivizes users toward a specific treatment.
arXiv Detail & Related papers (2024-06-15T20:54:48Z) - "Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations [3.637365301757111]
Methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process.
How to select the best set of human demonstrations that is most beneficial for learning becomes a major concern.
This paper presents EARLY, an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space.
arXiv Detail & Related papers (2024-06-05T08:52:21Z) - How to Leverage Diverse Demonstrations in Offline Imitation Learning [39.24627312800116]
Offline Imitation Learning (IL) with imperfect demonstrations has garnered increasing attention owing to the scarcity of expert data.
We introduce a simple yet effective data selection method that identifies positive behaviors based on their resultant states.
We then devise a lightweight behavior cloning algorithm capable of leveraging the expert and selected data correctly.
arXiv Detail & Related papers (2024-05-24T04:56:39Z) - Expert Proximity as Surrogate Rewards for Single Demonstration Imitation Learning [51.972577689963714]
Single-demonstration imitation learning (IL) is a practical approach for real-world applications where acquiring multiple expert demonstrations is costly or infeasible.
In contrast to typical IL settings, single-demonstration IL involves an agent having access to only one expert trajectory.
We highlight the issue of sparse reward signals in this setting and propose to mitigate this issue through our proposed Transition Discriminator-based IL (TDIL) method.
arXiv Detail & Related papers (2024-02-01T23:06:19Z) - Inverse Reinforcement Learning with Sub-optimal Experts [56.553106680769474]
We study the theoretical properties of the class of reward functions that are compatible with a given set of experts.
Our results show that the presence of multiple sub-optimal experts can significantly shrink the set of compatible rewards.
We analyze a uniform sampling algorithm that results in being minimax optimal whenever the sub-optimal experts' performance level is sufficiently close to the one of the optimal agent.
arXiv Detail & Related papers (2024-01-08T12:39:25Z) - A Simple Solution for Offline Imitation from Observations and Examples
with Possibly Incomplete Trajectories [122.11358440078581]
offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable.
We propose Trajectory-Aware Learning from Observations (TAILO) to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available.
arXiv Detail & Related papers (2023-11-02T15:41:09Z) - Self-Imitation Learning from Demonstrations [4.907551775445731]
Self-Imitation Learning exploits agent's past good experience to learn from suboptimal demonstrations.
We show that SILfD can learn from demonstrations that are noisy or far from optimal.
We also find SILfD superior to the existing state-of-the-art LfD algorithms in sparse environments.
arXiv Detail & Related papers (2022-03-21T11:56:56Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z) - Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations.
In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive.
We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z) - State-only Imitation with Transition Dynamics Mismatch [16.934888672659824]
Imitation Learning (IL) is a popular paradigm for training agents to achieve complicated goals by leveraging expert behavior.
We present a new state-only IL algorithm in this paper.
We show that our algorithm is particularly effective when there is a transition dynamics mismatch between the expert and imitator MDPs.
arXiv Detail & Related papers (2020-02-27T02:27:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.