Related papers: Triple-GAIL: A Multi-Modal Imitation Learning Framework with Generative Adversarial Nets

Triple-GAIL: A Multi-Modal Imitation Learning Framework with Generative Adversarial Nets

URL: http://arxiv.org/abs/2005.10622v2
Date: Fri, 22 May 2020 01:05:30 GMT
Title: Triple-GAIL: A Multi-Modal Imitation Learning Framework with Generative Adversarial Nets
Authors: Cong Fei, Bin Wang, Yuzheng Zhuang, Zongzhang Zhang, Jianye Hao, Hongbo Zhang, Xuewu Ji and Wulong Liu
Abstract summary: Triple-GAIL is able to learn skill selection and imitation jointly from both expert demonstrations and continuously generated experiences with data augmentation purpose. Experiments on real driver trajectories and real-time strategy game datasets demonstrate that Triple-GAIL can better fit multi-modal behaviors close to the demonstrators.
Score: 34.17829944466169
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generative adversarial imitation learning (GAIL) has shown promising results by taking advantage of generative adversarial nets, especially in the field of robot learning. However, the requirement of isolated single modal demonstrations limits the scalability of the approach to real world scenarios such as autonomous vehicles' demand for a proper understanding of human drivers' behavior. In this paper, we propose a novel multi-modal GAIL framework, named Triple-GAIL, that is able to learn skill selection and imitation jointly from both expert demonstrations and continuously generated experiences with data augmentation purpose by introducing an auxiliary skill selector. We provide theoretical guarantees on the convergence to optima for both of the generator and the selector respectively. Experiments on real driver trajectories and real-time strategy game datasets demonstrate that Triple-GAIL can better fit multi-modal behaviors close to the demonstrators and outperforms state-of-the-art methods.

Related papers

Is Diversity All You Need for Scalable Robotic Manipulation? [50.747150672933316]
We investigate the nuanced role of data diversity in robot learning by examining three critical dimensions-task (what to do), embodiment (which robot to use), and expert (who demonstrates)-challenging the conventional intuition of "more diverse is better"<n>We show that task diversity proves more critical than per-task demonstration quantity, benefiting transfer from diverse pre-training tasks to novel downstream scenarios.<n>We propose a distribution debiasing method to mitigate velocity ambiguity, the yielding GO-1-Pro achieves substantial performance gains of 15%, equivalent to using 2.5 times pre-training data.
arXiv Detail & Related papers (2025-07-08T17:52:44Z)
Breaking Imitation Bottlenecks: Reinforced Diffusion Powers Diverse Trajectory Generation [20.106116218594266]
DIVER is an end-to-end autonomous driving framework that integrates reinforcement learning and diffusion-based generation.<n>We show that DIVER significantly improves trajectory diversity, effectively addressing the mode collapse problem inherent in imitation learning.
arXiv Detail & Related papers (2025-07-05T14:19:19Z)
ExpertSteer: Intervening in LLMs through Expert Knowledge [71.12193680015622]
Activation steering offers a promising method to control the generation process of Large Language Models.<n>We propose ExpertSteer, a novel approach that leverages arbitrary specialized expert models to generate steering vectors.<n>We conduct comprehensive experiments using three LLMs on 15 popular benchmarks across four distinct domains.
arXiv Detail & Related papers (2025-05-18T08:55:46Z)
Learning Robust Anymodal Segmentor with Unimodal and Cross-modal Distillation [30.33381342502258]
Key challenge is unimodal bias, where multimodal segmentors over rely on certain modalities, causing performance drops when others are missing. We develop the first framework for learning robust segmentor that can handle any combinations of visual modalities.
arXiv Detail & Related papers (2024-11-26T06:15:27Z)
DeepInteraction++: Multi-Modality Interaction for Autonomous Driving [80.8837864849534]
We introduce a novel modality interaction strategy that allows individual per-modality representations to be learned and maintained throughout. DeepInteraction++ is a multi-modal interaction framework characterized by a multi-modal representational interaction encoder and a multi-modal predictive interaction decoder. Experiments demonstrate the superior performance of the proposed framework on both 3D object detection and end-to-end autonomous driving tasks.
arXiv Detail & Related papers (2024-08-09T14:04:21Z)
Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models [114.69732301904419]
We present an approach to apply end-to-end open-set (any environment/scene) autonomous driving that is capable of providing driving decisions from representations queryable by image and text. Our approach demonstrates unparalleled results in diverse tests while achieving significantly greater robustness in out-of-distribution situations.
arXiv Detail & Related papers (2023-10-26T17:56:35Z)
Exploiting Modality-Specific Features For Multi-Modal Manipulation Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks. Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment. We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z)
Generating Personas for Games with Multimodal Adversarial Imitation Learning [47.70823327747952]
Reinforcement learning has been widely successful in producing agents capable of playing games at a human level. Going beyond reinforcement learning is necessary to model a wide range of human playstyles. This paper presents a novel imitation learning approach to generate multiple persona policies for playtesting.
arXiv Detail & Related papers (2023-08-15T06:58:19Z)
A Two-stage Fine-tuning Strategy for Generalizable Manipulation Skill of Embodied AI [15.480968464853769]
We propose a novel two-stage fine-tuning strategy to enhance the generalization capability of our model based on the Maniskill2 benchmark. Our findings highlight the potential of our method to improve the generalization abilities of Embodied AI models and pave the way for their ractical applications in real-world scenarios.
arXiv Detail & Related papers (2023-07-21T04:15:36Z)
Generalized Multimodal ELBO [11.602089225841631]
Multiple data types naturally co-occur when describing real-world phenomena and learning from them is a long-standing goal in machine learning research. Existing self-supervised generative models approximating an ELBO are not able to fulfill all desired requirements of multimodal models. We propose a new, generalized ELBO formulation for multimodal data that overcomes these limitations.
arXiv Detail & Related papers (2021-05-06T07:05:00Z)
UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks. Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy. The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.