GAN-MPC: Training Model Predictive Controllers with Parameterized Cost
Functions using Demonstrations from Non-identical Experts
- URL: http://arxiv.org/abs/2305.19111v2
- Date: Wed, 7 Jun 2023 13:14:23 GMT
- Title: GAN-MPC: Training Model Predictive Controllers with Parameterized Cost
Functions using Demonstrations from Non-identical Experts
- Authors: Returaj Burnwal, Anirban Santara, Nirav P. Bhatt, Balaraman Ravindran,
Gaurav Aggarwal
- Abstract summary: We propose a generative adversarial network (GAN) to minimize the Jensen-Shannon divergence between the state-trajectory distributions of the demonstrator and the imitator.
We evaluate our approach on a variety of simulated robotics tasks of DeepMind Control suite.
- Score: 14.291720751625585
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model predictive control (MPC) is a popular approach for trajectory
optimization in practical robotics applications. MPC policies can optimize
trajectory parameters under kinodynamic and safety constraints and provide
guarantees on safety, optimality, generalizability, interpretability, and
explainability. However, some behaviors are complex and it is difficult to
hand-craft an MPC objective function. A special class of MPC policies called
Learnable-MPC addresses this difficulty using imitation learning from expert
demonstrations. However, they require the demonstrator and the imitator agents
to be identical which is hard to satisfy in many real world applications of
robotics. In this paper, we address the practical problem of training
Learnable-MPC policies when the demonstrator and the imitator do not share the
same dynamics and their state spaces may have a partial overlap. We propose a
novel approach that uses a generative adversarial network (GAN) to minimize the
Jensen-Shannon divergence between the state-trajectory distributions of the
demonstrator and the imitator. We evaluate our approach on a variety of
simulated robotics tasks of DeepMind Control suite and demonstrate the efficacy
of our approach at learning the demonstrator's behavior without having to copy
their actions.
Related papers
- Guided Reinforcement Learning for Robust Multi-Contact Loco-Manipulation [12.377289165111028]
Reinforcement learning (RL) often necessitates a meticulous Markov Decision Process (MDP) design tailored to each task.
This work proposes a systematic approach to behavior synthesis and control for multi-contact loco-manipulation tasks.
We define a task-independent MDP to train RL policies using only a single demonstration per task generated from a model-based trajectory.
arXiv Detail & Related papers (2024-10-17T17:46:27Z) - Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable
Environments [45.213059639254475]
We propose a new topic called imitator learning (ItorL)
It aims to derive an imitator module that can reconstruct the imitation policies based on very limited expert demonstrations.
For autonomous imitation policy building, we design a demonstration-based attention architecture for imitator policy.
arXiv Detail & Related papers (2023-10-09T13:35:28Z) - Leveraging Sequentiality in Reinforcement Learning from a Single
Demonstration [68.94506047556412]
We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration.
We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
arXiv Detail & Related papers (2022-11-09T10:28:40Z) - Active Predicting Coding: Brain-Inspired Reinforcement Learning for
Sparse Reward Robotic Control Problems [79.07468367923619]
We propose a backpropagation-free approach to robotic control through the neuro-cognitive computational framework of neural generative coding (NGC)
We design an agent built completely from powerful predictive coding/processing circuits that facilitate dynamic, online learning from sparse rewards.
We show that our proposed ActPC agent performs well in the face of sparse (extrinsic) reward signals and is competitive with or outperforms several powerful backprop-based RL approaches.
arXiv Detail & Related papers (2022-09-19T16:49:32Z) - Spatiotemporal Costmap Inference for MPC via Deep Inverse Reinforcement
Learning [27.243603228431564]
We propose a new IRLRL algorithm that learns a goal-conditionedtemporal reward function.
The resulting costmap is used by Model Predictive Controllers (MPCs) to perform a task.
arXiv Detail & Related papers (2022-01-17T17:36:29Z) - Nonprehensile Riemannian Motion Predictive Control [57.295751294224765]
We introduce a novel Real-to-Sim reward analysis technique to reliably imagine and predict the outcome of taking possible actions for a real robotic platform.
We produce a closed-loop controller to reactively push objects in a continuous action space.
We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines.
arXiv Detail & Related papers (2021-11-15T18:50:04Z) - Demonstration-Efficient Guided Policy Search via Imitation of Robust
Tube MPC [36.3065978427856]
We propose a strategy to compress a computationally expensive Model Predictive Controller (MPC) into a more computationally efficient representation based on a deep neural network and Imitation Learning (IL)
By generating a Robust Tube variant (RTMPC) of the MPC and leveraging properties from the tube, we introduce a data augmentation method that enables high demonstration-efficiency.
Our method outperforms strategies commonly employed in IL, such as DAgger and Domain Randomization, in terms of demonstration-efficiency and robustness to perturbations unseen during training.
arXiv Detail & Related papers (2021-09-21T01:50:19Z) - Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space.
We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function.
We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z) - Imitation Learning from MPC for Quadrupedal Multi-Gait Control [63.617157490920505]
We present a learning algorithm for training a single policy that imitates multiple gaits of a walking robot.
We use and extend MPC-Net, which is an Imitation Learning approach guided by Model Predictive Control.
We validate our approach on hardware and show that a single learned policy can replace its teacher to control multiple gaits.
arXiv Detail & Related papers (2021-03-26T08:48:53Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.