Variational Inference MPC using Normalizing Flows and
Out-of-Distribution Projection
- URL: http://arxiv.org/abs/2205.04667v1
- Date: Tue, 10 May 2022 04:43:15 GMT
- Title: Variational Inference MPC using Normalizing Flows and
Out-of-Distribution Projection
- Authors: Thomas Power and Dmitry Berenson
- Abstract summary: We propose a Model Predictive Control (MPC) method for collision-free navigation.
We learn a distribution that accounts for both the dynamics of the robot and complex obstacle geometries.
We show that FlowMPPI with projection outperforms state-of-the-art MPC baselines on both in-distribution and OOD environments.
- Score: 7.195824023358536
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a Model Predictive Control (MPC) method for collision-free
navigation that uses amortized variational inference to approximate the
distribution of optimal control sequences by training a normalizing flow
conditioned on the start, goal and environment. This representation allows us
to learn a distribution that accounts for both the dynamics of the robot and
complex obstacle geometries. We can then sample from this distribution to
produce control sequences which are likely to be both goal-directed and
collision-free as part of our proposed FlowMPPI sampling-based MPC method.
However, when deploying this method, the robot may encounter an
out-of-distribution (OOD) environment, i.e. one which is radically different
from those used in training. In such cases, the learned flow cannot be trusted
to produce low-cost control sequences. To generalize our method to OOD
environments we also present an approach that performs projection on the
representation of the environment as part of the MPC process. This projection
changes the environment representation to be more in-distribution while also
optimizing trajectory quality in the true environment. Our simulation results
on a 2D double-integrator and a 3D 12DoF underactuated quadrotor suggest that
FlowMPPI with projection outperforms state-of-the-art MPC baselines on both
in-distribution and OOD environments, including OOD environments generated from
real-world data.
Related papers
- Steering Masked Discrete Diffusion Models via Discrete Denoising Posterior Prediction [88.65168366064061]
We introduce Discrete Denoising Posterior Prediction (DDPP), a novel framework that casts the task of steering pre-trained MDMs as a problem of probabilistic inference.
Our framework leads to a family of three novel objectives that are all simulation-free, and thus scalable.
We substantiate our designs via wet-lab validation, where we observe transient expression of reward-optimized protein sequences.
arXiv Detail & Related papers (2024-10-10T17:18:30Z) - R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models [50.19174067263255]
We introduce prior preference learning techniques and self-revision schedules to help the agent excel in sparse-reward, continuous action, goal-based robotic control POMDP environments.
We show that our agents offer improved performance over state-of-the-art models in terms of cumulative rewards, relative stability, and success rate.
arXiv Detail & Related papers (2024-09-21T18:32:44Z) - Sampling for Model Predictive Trajectory Planning in Autonomous Driving using Normalizing Flows [1.2972104025246092]
This paper investigates several sampling approaches for trajectory generation.
normalizing flows originating from the field of variational inference are considered.
Learning-based normalizing flow models are trained for a more efficient exploration of the input domain.
arXiv Detail & Related papers (2024-04-15T10:45:12Z) - Parameter-Adaptive Approximate MPC: Tuning Neural-Network Controllers without Retraining [50.00291020618743]
This work introduces a novel, parameter-adaptive AMPC architecture capable of online tuning without recomputing large datasets and retraining.
We showcase the effectiveness of parameter-adaptive AMPC by controlling the swing-ups of two different real cartpole systems with a severely resource-constrained microcontroller (MCU)
Taken together, these contributions represent a marked step toward the practical application of AMPC in real-world systems.
arXiv Detail & Related papers (2024-04-08T20:02:19Z) - Sample Complexity Characterization for Linear Contextual MDPs [67.79455646673762]
Contextual decision processes (CMDPs) describe a class of reinforcement learning problems in which the transition kernels and reward functions can change over time with different MDPs indexed by a context variable.
CMDPs serve as an important framework to model many real-world applications with time-varying environments.
We study CMDPs under two linear function approximation models: Model I with context-varying representations and common linear weights for all contexts; and Model II with common representations for all contexts and context-varying linear weights.
arXiv Detail & Related papers (2024-02-05T03:25:04Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - CoVO-MPC: Theoretical Analysis of Sampling-based MPC and Optimal
Covariance Design [8.943418808959494]
We characterize the convergence property of a widely used sampling-based Model Predictive Path Integral Control (MPPI) method.
We show that MPPI enjoys at least linear convergence rates when the optimization is quadratic, which covers time-varying LQR systems.
Our theoretical analysis directly leads to a novel sampling-based MPC algorithm, CoVo-MPC.
Empirically, CoVo-MPC significantly outperforms standard MPPI by 43-54% in both simulations and real-world quad agile control tasks.
arXiv Detail & Related papers (2024-01-14T21:10:59Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - Learning Sampling Distributions for Model Predictive Control [36.82905770866734]
Sampling-based approaches to Model Predictive Control (MPC) have become a cornerstone of contemporary approaches to MPC.
We propose to carry out all operations in the latent space, allowing us to take full advantage of the learned distribution.
Specifically, we frame the learning problem as bi-level optimization and show how to train the controller with backpropagation-through-time.
arXiv Detail & Related papers (2022-12-05T20:35:36Z) - Demonstration-Efficient Guided Policy Search via Imitation of Robust
Tube MPC [36.3065978427856]
We propose a strategy to compress a computationally expensive Model Predictive Controller (MPC) into a more computationally efficient representation based on a deep neural network and Imitation Learning (IL)
By generating a Robust Tube variant (RTMPC) of the MPC and leveraging properties from the tube, we introduce a data augmentation method that enables high demonstration-efficiency.
Our method outperforms strategies commonly employed in IL, such as DAgger and Domain Randomization, in terms of demonstration-efficiency and robustness to perturbations unseen during training.
arXiv Detail & Related papers (2021-09-21T01:50:19Z) - Parallelised Diffeomorphic Sampling-based Motion Planning [30.310891362316863]
We propose Parallelised Diffeomorphic Sampling-based Motion Planning (PDMP)
PDMP transforms sampling distributions of sampling-based motion planners, in a manner akin to normalising flows.
PDMP is able to leverage gradient information of costs, to inject specifications, in a manner similar to optimisation-based motion planning methods.
arXiv Detail & Related papers (2021-08-26T13:15:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.