Related papers: Explorative Imitation Learning: A Path Signature Approach for Continuous Environments

Explorative Imitation Learning: A Path Signature Approach for Continuous Environments

URL: http://arxiv.org/abs/2407.04856v2
Date: Mon, 22 Jul 2024 15:32:50 GMT
Title: Explorative Imitation Learning: A Path Signature Approach for Continuous Environments
Authors: Nathan Gavenski, Juarez Monteiro, Felipe Meneguzzi, Michael Luck, Odinaldo Rodrigues,
Abstract summary: Continuous Imitation Learning from Observation (CILO) is a new method augmenting imitation learning with two important features. CILO exploration allows for more diverse state transitions, requiring less expert trajectories and resulting in fewer training iterations. It has the best overall performance of all imitation learning methods in all environments, outperforming the expert in two of them.
Score: 9.416194245966022
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Some imitation learning methods combine behavioural cloning with self-supervision to infer actions from state pairs. However, most rely on a large number of expert trajectories to increase generalisation and human intervention to capture key aspects of the problem, such as domain constraints. In this paper, we propose Continuous Imitation Learning from Observation (CILO), a new method augmenting imitation learning with two important features: (i) exploration, allowing for more diverse state transitions, requiring less expert trajectories and resulting in fewer training iterations; and (ii) path signatures, allowing for automatic encoding of constraints, through the creation of non-parametric representations of agents and expert trajectories. We compared CILO with a baseline and two leading imitation learning methods in five environments. It had the best overall performance of all methods in all environments, outperforming the expert in two of them.

Related papers

Breaking Imitation Bottlenecks: Reinforced Diffusion Powers Diverse Trajectory Generation [20.106116218594266]
DIVER is an end-to-end autonomous driving framework that integrates reinforcement learning and diffusion-based generation.<n>We show that DIVER significantly improves trajectory diversity, effectively addressing the mode collapse problem inherent in imitation learning.
arXiv Detail & Related papers (2025-07-05T14:19:19Z)
Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration [77.36486933055907]
We propose a novel imitation learning algorithm called Imitation Learning with Double Exploration (ILDE)<n>ILDE implements exploration in two aspects: (1) optimistic policy optimization via an exploration bonus that rewards state-action pairs with high uncertainty to potentially improve the convergence to the expert policy, and (2) curiosity-driven exploration of the states that deviate from the demonstration trajectories to potentially yield beyond-expert performance.<n> Empirically, we demonstrate that ILDE outperforms the state-of-the-art imitation learning algorithms in terms of sample efficiency and achieves beyond-expert performance on Atari and MuJoCo tasks with fewer demonstrations than in previous work.
arXiv Detail & Related papers (2025-06-25T10:39:32Z)
Offline Learning of Controllable Diverse Behaviors [19.0544729496907]
Imitation Learning (IL) techniques aim to replicate human behaviors in specific tasks. We propose a new method based on temporal consistency and controllability. We compare our approach to state-of-the-art methods over a diverse set of tasks and environments.
arXiv Detail & Related papers (2025-04-25T08:16:56Z)
COMBO-Grasp: Learning Constraint-Based Manipulation for Bimanual Occluded Grasping [56.907940167333656]
Occluded robot grasping is where the desired grasp poses are kinematically infeasible due to environmental constraints such as surface collisions. Traditional robot manipulation approaches struggle with the complexity of non-prehensile or bimanual strategies commonly used by humans. We introduce Constraint-based Manipulation for Bimanual Occluded Grasping (COMBO-Grasp), a learning-based approach which leverages two coordinated policies.
arXiv Detail & Related papers (2025-02-12T01:31:01Z)
Imitation from Diverse Behaviors: Wasserstein Quality Diversity Imitation Learning with Single-Step Archive Exploration [37.836675202590406]
This work introduces Wasserstein Quality Diversity Imitation Learning (WQDIL) It improves the stability of imitation learning in the quality diversity setting with latent adversarial training based on a Wasserstein Auto-Encoder (WAE) It mitigates a behavior-overfitting issue using a measure-conditioned reward function with a single-step archive exploration bonus.
arXiv Detail & Related papers (2024-11-11T13:11:18Z)
Two-stage Learning-to-Defer for Multi-Task Learning [3.4289478404209826]
We introduce a Learning-to-Defer approach for multi-task learning that encompasses both classification and regression tasks. Our two-stage approach utilizes a rejector that defers decisions to the most accurate agent among a pre-trained joint-regressor models and one or more external experts.
arXiv Detail & Related papers (2024-10-21T07:44:57Z)
Diffusing States and Matching Scores: A New Framework for Imitation Learning [16.941612670582522]
Adversarial Imitation Learning is traditionally framed as a two-player zero-sum game between a learner and an adversarially chosen cost function. diffusion models have emerged as a non-adversarial alternative to GANs that merely require training a score function via regression. We show our approach outperforms both GAN-style imitation learning baselines and discriminator-free imitation learning baselines across various continuous control problems.
arXiv Detail & Related papers (2024-10-17T17:59:25Z)
Quality Diversity Imitation Learning [9.627530753815968]
We introduce the first generic framework for Quality Diversity Imitation Learning (QD-IL) Our framework integrates the principles of quality diversity with adversarial imitation learning (AIL) methods, and can potentially improve any inverse reinforcement learning (IRL) method. Our method even achieves 2x expert performance in the most challenging Humanoid environment.
arXiv Detail & Related papers (2024-10-08T15:49:33Z)
RankCLIP: Ranking-Consistent Language-Image Pretraining [7.92247304974314]
RANKCLIP is a novel pretraining method that extends beyond the rigid one-to-one matching framework of CLIP. By extending the traditional pair-wise loss to list-wise, RANKCLIP improves the alignment process, enabling it to capture the nuanced many-to-many relationships between and within each modality.
arXiv Detail & Related papers (2024-04-15T00:12:27Z)
RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z)
NormAUG: Normalization-guided Augmentation for Domain Generalization [60.159546669021346]
We propose a simple yet effective method called NormAUG (Normalization-guided Augmentation) for deep learning. Our method introduces diverse information at the feature level and improves the generalization of the main path. In the test stage, we leverage an ensemble strategy to combine the predictions from the auxiliary path of our model, further boosting performance.
arXiv Detail & Related papers (2023-07-25T13:35:45Z)
Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning [131.1852444489217]
This paper presents Object-aware REgularizatiOn (OREO), a technique that regularizes an imitation policy in an object-aware manner. Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions.
arXiv Detail & Related papers (2021-10-27T01:56:23Z)
Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and Contrastive Meta-Learning [51.03781020616402]
Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications. We propose a few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class. Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions.
arXiv Detail & Related papers (2021-08-15T02:21:01Z)
Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components. First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective. Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z)
Language-guided Navigation via Cross-Modal Grounding and Alternate Adversarial Learning [66.9937776799536]
The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments. The main challenges of VLN arise mainly from two aspects: first, the agent needs to attend to the meaningful paragraphs of the language instruction corresponding to the dynamically-varying visual environments. We propose a cross-modal grounding module to equip the agent with a better ability to track the correspondence between the textual and visual modalities.
arXiv Detail & Related papers (2020-11-22T09:13:46Z)
Imitating Unknown Policies via Exploration [18.78730427200346]
Behavioral cloning is an imitation learning technique that teaches an agent how to behave through expert demonstrations. Recent approaches use self-supervision of fully-observable unlabeled snapshots of the states to decode state-pairs into actions. We address these limitations incorporating a two-phase model into the original framework, which learns from unlabeled observations via exploration.
arXiv Detail & Related papers (2020-08-13T03:03:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.