Semi-Supervised Imitation Learning of Team Policies from Suboptimal
Demonstrations
- URL: http://arxiv.org/abs/2205.02959v2
- Date: Mon, 9 May 2022 02:53:14 GMT
- Title: Semi-Supervised Imitation Learning of Team Policies from Suboptimal
Demonstrations
- Authors: Sangwon Seo and Vaibhav V. Unhelkar
- Abstract summary: We present an imitation learning algorithm to model behavior of teams performing sequential tasks in Markovian domains.
In contrast to existing multi-agent imitation learning techniques, BTIL explicitly models and infers the time-varying mental states of team members.
- Score: 3.5179584114197286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Bayesian Team Imitation Learner (BTIL), an imitation learning
algorithm to model behavior of teams performing sequential tasks in Markovian
domains. In contrast to existing multi-agent imitation learning techniques,
BTIL explicitly models and infers the time-varying mental states of team
members, thereby enabling learning of decentralized team policies from
demonstrations of suboptimal teamwork. Further, to allow for sample- and
label-efficient policy learning from small datasets, BTIL employs a Bayesian
perspective and is capable of learning from semi-supervised demonstrations. We
demonstrate and benchmark the performance of BTIL on synthetic multi-agent
tasks as well as a novel dataset of human-agent teamwork. Our experiments show
that BTIL can successfully learn team policies from demonstrations despite the
influence of team members' (time-varying and potentially misaligned) mental
states on their behavior.
Related papers
- Hierarchical Imitation Learning of Team Behavior from Heterogeneous Demonstrations [2.07180164747172]
We introduce DTIL: a hierarchical MAIL algorithm designed to learn multimodal team behaviors in complex sequential tasks.
By employing a distribution-matching approach, DTIL Imitations compounding errors and scales effectively to mitigate to long horizons and continuous state representations.
arXiv Detail & Related papers (2025-02-24T20:05:59Z) - Dynamic Non-Prehensile Object Transport via Model-Predictive Reinforcement Learning [24.079032278280447]
We propose an approach that combines batch reinforcement learning (RL) with model-predictive control (MPC)
We validate the proposed approach through extensive simulated and real-world experiments on a Franka Panda robot performing the robot waiter task.
arXiv Detail & Related papers (2024-11-27T03:33:42Z) - AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent [75.91274222142079]
In this study, we aim to scale up demonstrations in a data-efficient way to facilitate the learning of generalist robotic agents.
AdaDemo is a framework designed to improve multi-task policy learning by actively and continually expanding the demonstration dataset.
arXiv Detail & Related papers (2024-04-11T01:59:29Z) - Zero-shot Imitation Policy via Search in Demonstration Dataset [0.16817021284806563]
Behavioral cloning uses a dataset of demonstrations to learn a policy.
We propose to use latent spaces of pre-trained foundation models to index a demonstration dataset.
Our approach can effectively recover meaningful demonstrations and show human-like behavior of an agent in the Minecraft environment.
arXiv Detail & Related papers (2024-01-29T18:38:29Z) - Mastering Robot Manipulation with Multimodal Prompts through Pretraining and Multi-task Fine-tuning [49.92517970237088]
We tackle the problem of training a robot to understand multimodal prompts.
This type of task poses a major challenge to robots' capability to understand the interconnection and complementarity between vision and language signals.
We introduce an effective framework that learns a policy to perform robot manipulation with multimodal prompts.
arXiv Detail & Related papers (2023-10-14T22:24:58Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Unified Demonstration Retriever for In-Context Learning [56.06473069923567]
Unified Demonstration Retriever (textbfUDR) is a single model to retrieve demonstrations for a wide range of tasks.
We propose a multi-task list-wise ranking training framework, with an iterative mining strategy to find high-quality candidates.
Experiments on 30+ tasks across 13 task families and multiple data domains show that UDR significantly outperforms baselines.
arXiv Detail & Related papers (2023-05-07T16:07:11Z) - Bottom-Up Skill Discovery from Unsegmented Demonstrations for
Long-Horizon Robot Manipulation [55.31301153979621]
We tackle real-world long-horizon robot manipulation tasks through skill discovery.
We present a bottom-up approach to learning a library of reusable skills from unsegmented demonstrations.
Our method has shown superior performance over state-of-the-art imitation learning methods in multi-stage manipulation tasks.
arXiv Detail & Related papers (2021-09-28T16:18:54Z) - Demonstration-Guided Reinforcement Learning with Learned Skills [23.376115889936628]
Demonstration-guided reinforcement learning (RL) is a promising approach for learning complex behaviors.
In this work, we aim to exploit this shared subtask structure to increase the efficiency of demonstration-guided RL.
We propose Skill-based Learning with Demonstrations (SkiLD), an algorithm for demonstration-guided RL that efficiently leverages the provided demonstrations.
arXiv Detail & Related papers (2021-07-21T17:59:34Z) - Learning Adaptable Policy via Meta-Adversarial Inverse Reinforcement
Learning for Decision-making Tasks [2.1485350418225244]
We build an adaptable imitation learning model based on the integration of Meta-learning and Adversarial Inverse Reinforcement Learning.
We exploit the adversarial learning and inverse reinforcement learning mechanisms to learn policies and reward functions simultaneously from available training tasks.
arXiv Detail & Related papers (2021-03-23T17:16:38Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z) - Learning Modality-Specific Representations with Self-Supervised
Multi-Task Learning for Multimodal Sentiment Analysis [11.368438990334397]
We develop a self-supervised learning strategy to acquire independent unimodal supervisions.
We conduct extensive experiments on three public multimodal baseline datasets.
Our method achieves comparable performance than human-annotated unimodal labels.
arXiv Detail & Related papers (2021-02-09T14:05:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.