Information Maximizing Curriculum: A Curriculum-Based Approach for
Imitating Diverse Skills
- URL: http://arxiv.org/abs/2303.15349v2
- Date: Tue, 31 Oct 2023 14:21:19 GMT
- Title: Information Maximizing Curriculum: A Curriculum-Based Approach for
Imitating Diverse Skills
- Authors: Denis Blessing, Onur Celik, Xiaogang Jia, Moritz Reuss, Maximilian
Xiling Li, Rudolf Lioutikov, Gerhard Neumann
- Abstract summary: We propose a curriculum-based approach that assigns a weight to each data point and encourages the model to specialize in the data it can represent.
To cover all modes and thus, enable diverse behavior, we extend our approach to a mixture of experts (MoE) policy, where each mixture component selects its own subset of the training data for learning.
A novel, maximum entropy-based objective is proposed to achieve full coverage of the dataset, thereby enabling the policy to encompass all modes within the data distribution.
- Score: 14.685043874797742
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation learning uses data for training policies to solve complex tasks.
However, when the training data is collected from human demonstrators, it often
leads to multimodal distributions because of the variability in human actions.
Most imitation learning methods rely on a maximum likelihood (ML) objective to
learn a parameterized policy, but this can result in suboptimal or unsafe
behavior due to the mode-averaging property of the ML objective. In this work,
we propose Information Maximizing Curriculum, a curriculum-based approach that
assigns a weight to each data point and encourages the model to specialize in
the data it can represent, effectively mitigating the mode-averaging problem by
allowing the model to ignore data from modes it cannot represent. To cover all
modes and thus, enable diverse behavior, we extend our approach to a mixture of
experts (MoE) policy, where each mixture component selects its own subset of
the training data for learning. A novel, maximum entropy-based objective is
proposed to achieve full coverage of the dataset, thereby enabling the policy
to encompass all modes within the data distribution. We demonstrate the
effectiveness of our approach on complex simulated control tasks using diverse
human demonstrations, achieving superior performance compared to
state-of-the-art methods.
Related papers
- IMLE Policy: Fast and Sample Efficient Visuomotor Policy Learning via Implicit Maximum Likelihood Estimation [3.7584322469996896]
IMLE Policy is a novel behaviour cloning approach based on Implicit Maximum Likelihood Estimation (IMLE)
It excels in low-data regimes, effectively learning from minimal demonstrations and requiring 38% less data on average to match the performance of baseline methods in learning complex multi-modal behaviours.
We validate our approach across diverse manipulation tasks in simulated and real-world environments, showcasing its ability to capture complex behaviours under data constraints.
arXiv Detail & Related papers (2025-02-17T23:22:49Z) - Learning on One Mode: Addressing Multi-Modality in Offline Reinforcement Learning [9.38848713730931]
offline reinforcement learning seeks to learn optimal policies from static datasets without interacting with the environment.
Existing methods often assume unimodal behaviour policies, leading to suboptimal performance when this assumption is violated.
We propose Weighted Imitation Learning on One Mode (LOM), a novel approach that focuses on learning from a single, promising mode of the behaviour policy.
arXiv Detail & Related papers (2024-12-04T11:57:36Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Curriculum-Based Imitation of Versatile Skills [15.97723808124603]
Learning skills by imitation is a promising concept for the intuitive teaching of robots.
A common way to learn such skills is to learn a parametric model by maximizing the likelihood given the demonstrations.
Yet, human demonstrations are often multi-modal, i.e., the same task is solved in multiple ways.
arXiv Detail & Related papers (2023-04-11T12:10:41Z) - Evaluating model-based planning and planner amortization for continuous
control [79.49319308600228]
We take a hybrid approach, combining model predictive control (MPC) with a learned model and model-free policy learning.
We find that well-tuned model-free agents are strong baselines even for high DoF control problems.
We show that it is possible to distil a model-based planner into a policy that amortizes the planning without any loss of performance.
arXiv Detail & Related papers (2021-10-07T12:00:40Z) - Maximum Likelihood Estimation for Multimodal Learning with Missing
Modality [10.91899856969822]
We propose an efficient approach based on maximum likelihood estimation to incorporate the knowledge in the modality-missing data.
Our results demonstrate the effectiveness of the proposed approach, even when 95% of the training data has missing modality.
arXiv Detail & Related papers (2021-08-24T03:50:54Z) - Double Meta-Learning for Data Efficient Policy Optimization in
Non-Stationary Environments [12.45281856559346]
We are interested in learning models of non-stationary environments, which can be framed as a multi-task learning problem.
Model-free reinforcement learning algorithms can achieve good performance in multi-task learning at a cost of extensive sampling.
While model-based approaches are among the most data efficient learning algorithms, they still struggle with complex tasks and model uncertainties.
arXiv Detail & Related papers (2020-11-21T03:19:35Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.