Explorative Imitation Learning: A Path Signature Approach for Continuous Environments
- URL: http://arxiv.org/abs/2407.04856v2
- Date: Mon, 22 Jul 2024 15:32:50 GMT
- Title: Explorative Imitation Learning: A Path Signature Approach for Continuous Environments
- Authors: Nathan Gavenski, Juarez Monteiro, Felipe Meneguzzi, Michael Luck, Odinaldo Rodrigues,
- Abstract summary: Continuous Imitation Learning from Observation (CILO) is a new method augmenting imitation learning with two important features.
CILO exploration allows for more diverse state transitions, requiring less expert trajectories and resulting in fewer training iterations.
It has the best overall performance of all imitation learning methods in all environments, outperforming the expert in two of them.
- Score: 9.416194245966022
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Some imitation learning methods combine behavioural cloning with self-supervision to infer actions from state pairs. However, most rely on a large number of expert trajectories to increase generalisation and human intervention to capture key aspects of the problem, such as domain constraints. In this paper, we propose Continuous Imitation Learning from Observation (CILO), a new method augmenting imitation learning with two important features: (i) exploration, allowing for more diverse state transitions, requiring less expert trajectories and resulting in fewer training iterations; and (ii) path signatures, allowing for automatic encoding of constraints, through the creation of non-parametric representations of agents and expert trajectories. We compared CILO with a baseline and two leading imitation learning methods in five environments. It had the best overall performance of all methods in all environments, outperforming the expert in two of them.
Related papers
- Imitation from Diverse Behaviors: Wasserstein Quality Diversity Imitation Learning with Single-Step Archive Exploration [37.836675202590406]
This work introduces Wasserstein Quality Diversity Imitation Learning (WQDIL)
It improves the stability of imitation learning in the quality diversity setting with latent adversarial training based on a Wasserstein Auto-Encoder (WAE)
It mitigates a behavior-overfitting issue using a measure-conditioned reward function with a single-step archive exploration bonus.
arXiv Detail & Related papers (2024-11-11T13:11:18Z) - Two-stage Learning-to-Defer for Multi-Task Learning [3.4289478404209826]
We introduce a Learning-to-Defer approach for multi-task learning that encompasses both classification and regression tasks.
Our two-stage approach utilizes a rejector that defers decisions to the most accurate agent among a pre-trained joint-regressor models and one or more external experts.
arXiv Detail & Related papers (2024-10-21T07:44:57Z) - Quality Diversity Imitation Learning [9.627530753815968]
We introduce the first generic framework for Quality Diversity Imitation Learning (QD-IL)
Our framework integrates the principles of quality diversity with adversarial imitation learning (AIL) methods, and can potentially improve any inverse reinforcement learning (IRL) method.
Our method even achieves 2x expert performance in the most challenging Humanoid environment.
arXiv Detail & Related papers (2024-10-08T15:49:33Z) - RankCLIP: Ranking-Consistent Language-Image Pretraining [7.92247304974314]
RANKCLIP is a novel pretraining method that extends beyond the rigid one-to-one matching framework of CLIP.
By extending the traditional pair-wise loss to list-wise, RANKCLIP improves the alignment process, enabling it to capture the nuanced many-to-many relationships between and within each modality.
arXiv Detail & Related papers (2024-04-15T00:12:27Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - NormAUG: Normalization-guided Augmentation for Domain Generalization [60.159546669021346]
We propose a simple yet effective method called NormAUG (Normalization-guided Augmentation) for deep learning.
Our method introduces diverse information at the feature level and improves the generalization of the main path.
In the test stage, we leverage an ensemble strategy to combine the predictions from the auxiliary path of our model, further boosting performance.
arXiv Detail & Related papers (2023-07-25T13:35:45Z) - Object-Aware Regularization for Addressing Causal Confusion in Imitation
Learning [131.1852444489217]
This paper presents Object-aware REgularizatiOn (OREO), a technique that regularizes an imitation policy in an object-aware manner.
Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions.
arXiv Detail & Related papers (2021-10-27T01:56:23Z) - Few-Shot Fine-Grained Action Recognition via Bidirectional Attention and
Contrastive Meta-Learning [51.03781020616402]
Fine-grained action recognition is attracting increasing attention due to the emerging demand of specific action understanding in real-world applications.
We propose a few-shot fine-grained action recognition problem, aiming to recognize novel fine-grained actions with only few samples given for each class.
Although progress has been made in coarse-grained actions, existing few-shot recognition methods encounter two issues handling fine-grained actions.
arXiv Detail & Related papers (2021-08-15T02:21:01Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z) - Language-guided Navigation via Cross-Modal Grounding and Alternate
Adversarial Learning [66.9937776799536]
The emerging vision-and-language navigation (VLN) problem aims at learning to navigate an agent to the target location in unseen photo-realistic environments.
The main challenges of VLN arise mainly from two aspects: first, the agent needs to attend to the meaningful paragraphs of the language instruction corresponding to the dynamically-varying visual environments.
We propose a cross-modal grounding module to equip the agent with a better ability to track the correspondence between the textual and visual modalities.
arXiv Detail & Related papers (2020-11-22T09:13:46Z) - Imitating Unknown Policies via Exploration [18.78730427200346]
Behavioral cloning is an imitation learning technique that teaches an agent how to behave through expert demonstrations.
Recent approaches use self-supervision of fully-observable unlabeled snapshots of the states to decode state-pairs into actions.
We address these limitations incorporating a two-phase model into the original framework, which learns from unlabeled observations via exploration.
arXiv Detail & Related papers (2020-08-13T03:03:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.