Related papers: Model-Predictive Control via Cross-Entropy and Gradient-Based Optimization

Model-Predictive Control via Cross-Entropy and Gradient-Based Optimization

URL: http://arxiv.org/abs/2004.08763v1
Date: Sun, 19 Apr 2020 03:54:50 GMT
Title: Model-Predictive Control via Cross-Entropy and Gradient-Based Optimization
Authors: Homanga Bharadhwaj, Kevin Xie, Florian Shkurti
Abstract summary: Cross-Entropy Method (CEM) is a population-based optimization method for planning a sequence of actions. We propose a method to solve this problem by interleaving CEM and gradient descent steps in optimizing the action sequence. Our experiments show faster convergence of the proposed hybrid approach, even for high-dimensional action spaces.
Score: 26.497575737219794
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent works in high-dimensional model-predictive control and model-based reinforcement learning with learned dynamics and reward models have resorted to population-based optimization methods, such as the Cross-Entropy Method (CEM), for planning a sequence of actions. To decide on an action to take, CEM conducts a search for the action sequence with the highest return according to the dynamics model and reward. Action sequences are typically randomly sampled from an unconditional Gaussian distribution and evaluated on the environment. This distribution is iteratively updated towards action sequences with higher returns. However, this planning method can be very inefficient, especially for high-dimensional action spaces. An alternative line of approaches optimize action sequences directly via gradient descent, but are prone to local optima. We propose a method to solve this planning problem by interleaving CEM and gradient descent steps in optimizing the action sequence. Our experiments show faster convergence of the proposed hybrid approach, even for high-dimensional action spaces, avoidance of local minima, and better or equal performance to CEM. Code accompanying the paper is available here https://github.com/homangab/gradcem.

Related papers

Refining Alignment Framework for Diffusion Models with Intermediate-Step Preference Ranking [50.325021634589596]
We propose a Tailored Optimization Preference (TailorPO) framework for aligning diffusion models with human preference. Our approach directly ranks intermediate noisy samples based on their step-wise reward, and effectively resolves the gradient direction issues. Experimental results demonstrate that our method significantly improves the model's ability to generate aesthetically pleasing and human-preferred images.
arXiv Detail & Related papers (2025-02-01T16:08:43Z)
Covariance-Adaptive Sequential Black-box Optimization for Diffusion Targeted Generation [60.41803046775034]
We show how to perform user-preferred targeted generation via diffusion models with only black-box target scores of users. Experiments on both numerical test problems and target-guided 3D-molecule generation tasks show the superior performance of our method in achieving better target scores.
arXiv Detail & Related papers (2024-06-02T17:26:27Z)
Improving sample efficiency of high dimensional Bayesian optimization with MCMC [7.241485121318798]
We propose a new method based on Markov Chain Monte Carlo to efficiently sample from an approximated posterior. We show experimentally that both the Metropolis-Hastings and the Langevin Dynamics version of our algorithm outperform state-of-the-art methods in high-dimensional sequential optimization and reinforcement learning benchmarks.
arXiv Detail & Related papers (2024-01-05T05:56:42Z)
Stochastic Gradient Descent for Gaussian Processes Done Right [86.83678041846971]
We show that when emphdone right -- by which we mean using specific insights from optimisation and kernel communities -- gradient descent is highly effective. We introduce a emphstochastic dual descent algorithm, explain its design in an intuitive manner and illustrate the design choices. Our method places Gaussian process regression on par with state-of-the-art graph neural networks for molecular binding affinity prediction.
arXiv Detail & Related papers (2023-10-31T16:15:13Z)
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms [88.74308282658133]
Reization (RP) Policy Gradient Methods (PGMs) have been widely adopted for continuous control tasks in robotics and computer graphics. Recent studies have revealed that, when applied to long-term reinforcement learning problems, model-based RP PGMs may experience chaotic and non-smooth optimization landscapes. We propose a spectral normalization method to mitigate the exploding variance issue caused by long model unrolls.
arXiv Detail & Related papers (2023-10-30T18:43:21Z)
Planning with Sequence Models through Iterative Energy Minimization [22.594413287842574]
We suggest an approach towards integrating planning with sequence models based on the idea of iterative energy minimization. We train a masked language model to capture an implicit energy function over trajectories of actions, and formulate planning as finding a trajectory of actions with minimum energy. We illustrate how this procedure enables improved performance over recent approaches across BabyAI and Atari environments.
arXiv Detail & Related papers (2023-03-28T17:53:22Z)
A Particle-based Sparse Gaussian Process Optimizer [5.672919245950197]
We present a new swarm-swarm-based framework utilizing the underlying dynamical process of descent. The biggest advantage of this approach is greater exploration around the current state before deciding descent descent.
arXiv Detail & Related papers (2022-11-26T09:06:15Z)
Inferring Smooth Control: Monte Carlo Posterior Policy Iteration with Gaussian Processes [39.411957858548355]
We show how to achieve smoother model predictive factor control using online sequential inference. We evaluate this approach on several robot control tasks, matching to sample prior methods while also ensuring smoothness.
arXiv Detail & Related papers (2022-10-07T12:56:31Z)
CEM-GD: Cross-Entropy Method with Gradient Descent Planner for Model-Based Reinforcement Learning [41.233656743112185]
We propose a novel planner that combines first-order methods with Cross-Entropy Method (CEM) We show that as the dimensionality of the planning problem increases, CEM-GD maintains desirable performance with a constant small number of samples.
arXiv Detail & Related papers (2021-12-14T21:11:27Z)
Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information. We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z)
Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering [53.523517926927894]
We explore the use of exact per-sample Hessian-vector products and gradients to construct self-tuning quadratics. We prove that our model-based procedure converges in noisy gradient setting. This is an interesting step for constructing self-tuning quadratics.
arXiv Detail & Related papers (2020-11-09T22:07:30Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.