Adaptive t-Momentum-based Optimization for Unknown Ratio of Outliers in
Amateur Data in Imitation Learning
- URL: http://arxiv.org/abs/2108.00625v1
- Date: Mon, 2 Aug 2021 04:30:41 GMT
- Title: Adaptive t-Momentum-based Optimization for Unknown Ratio of Outliers in
Amateur Data in Imitation Learning
- Authors: Wendyam Eric Lionel Ilboudo, Taisuke Kobayashi, Kenji Sugimoto
- Abstract summary: Behavioral (BC) bears a high potential for safe and direct transfer of human skills to robots.
In order to allow the imitators to effectively learn from imperfect demonstrations, we propose to employ the robust t-momentum optimization algorithm.
We show empirically how the algorithm can be used to produce robust BC imitators against datasets with unknown heaviness.
- Score: 3.145455301228175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Behavioral cloning (BC) bears a high potential for safe and direct transfer
of human skills to robots. However, demonstrations performed by human operators
often contain noise or imperfect behaviors that can affect the efficiency of
the imitator if left unchecked. In order to allow the imitators to effectively
learn from imperfect demonstrations, we propose to employ the robust t-momentum
optimization algorithm. This algorithm builds on the Student's t-distribution
in order to deal with heavy-tailed data and reduce the effect of outlying
observations. We extend the t-momentum algorithm to allow for an adaptive and
automatic robustness and show empirically how the algorithm can be used to
produce robust BC imitators against datasets with unknown heaviness. Indeed,
the imitators trained with the t-momentum-based Adam optimizers displayed
robustness to imperfect demonstrations on two different manipulation tasks with
different robots and revealed the capability to take advantage of the
additional data while reducing the adverse effect of non-optimal behaviors.
Related papers
- Uncertainty-aware Human Mobility Modeling and Anomaly Detection [28.311683535974634]
We study how to model human agents' mobility behavior toward effective anomaly detection.
We use GPS data as a sequence stay-point events, each with a set of characterizingtemporal features.
Experiments on large expert-simulated datasets with tens of thousands of agents demonstrate the effectiveness of our model.
arXiv Detail & Related papers (2024-10-02T06:57:08Z) - Equivariant Reinforcement Learning under Partial Observability [18.87759041528553]
This paper identifies partially observable domains where symmetries can be a useful inductive bias for efficient learning.
Our actor-critic reinforcement learning agents can reuse solutions in the past for related scenarios.
arXiv Detail & Related papers (2024-08-26T15:07:01Z) - Erasing Undesirable Influence in Diffusion Models [51.225365010401006]
Diffusion models are highly effective at generating high-quality images but pose risks, such as the unintentional generation of NSFW (not safe for work) content.
In this work, we introduce EraseDiff, an algorithm designed to preserve the utility of the diffusion model on retained data while removing the unwanted information associated with the data to be forgotten.
arXiv Detail & Related papers (2024-01-11T09:30:36Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - Towards Efficient Data-Centric Robust Machine Learning with Noise-based
Augmentation [10.859556815535706]
The data-centric machine learning aims to find effective ways to build appropriate datasets which can improve the performance of AI models.
We introduce a noised-based data augmentation method which is composed of Gaussian Noise, Salt-and-Pepper noise, and the PGD adversarial perturbations.
The proposed method is built on lightweight algorithms and proved highly effective based on comprehensive evaluations.
arXiv Detail & Related papers (2022-03-08T02:05:40Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.