A Simulation-Free Deep Learning Approach to Stochastic Optimal Control
- URL: http://arxiv.org/abs/2410.05163v2
- Date: Tue, 8 Oct 2024 17:24:07 GMT
- Title: A Simulation-Free Deep Learning Approach to Stochastic Optimal Control
- Authors: Mengjian Hua, Matthieu Laurière, Eric Vanden-Eijnden,
- Abstract summary: We propose a simulation-free algorithm for the solution of generic problems in optimal control (SOC)
Unlike existing methods, our approach does not require the solution of an adjoint problem.
- Score: 12.699529713351287
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a simulation-free algorithm for the solution of generic problems in stochastic optimal control (SOC). Unlike existing methods, our approach does not require the solution of an adjoint problem, but rather leverages Girsanov theorem to directly calculate the gradient of the SOC objective on-policy. This allows us to speed up the optimization of control policies parameterized by neural networks since it completely avoids the expensive back-propagation step through stochastic differential equations (SDEs) used in the Neural SDE framework. In particular, it enables us to solve SOC problems in high dimension and on long time horizons. We demonstrate the efficiency of our approach in various domains of applications, including standard stochastic optimal control problems, sampling from unnormalized distributions via construction of a Schr\"odinger-F\"ollmer process, and fine-tuning of pre-trained diffusion models. In all cases our method is shown to outperform the existing methods in both the computing time and memory efficiency.
Related papers
- Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process [11.01014302314467]
We focus on optimizing deep neural networks parameterized differential equations (SDEs)
We propose constraining the SDE to be consistent with its associated perturbation process.
Our framework offers a versatile selection of policy gradient methods to effectively and efficiently train SDEs.
arXiv Detail & Related papers (2024-03-07T02:24:45Z) - Smoothing Methods for Automatic Differentiation Across Conditional
Branches [0.0]
Smooth interpretation (SI) approximates the convolution of a program's output with a Gaussian kernel, thus smoothing its output in a principled manner.
We combine SI with automatic differentiation (AD) to efficiently compute gradients of smoothed programs.
We propose a novel Monte Carlo estimator that avoids the underlying assumptions by estimating the smoothed programs' gradients through a combination of AD and sampling.
arXiv Detail & Related papers (2023-10-05T15:08:37Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Multistage Stochastic Optimization via Kernels [3.7565501074323224]
We develop a non-parametric, data-driven, tractable approach for solving multistage optimization problems.
We show that the proposed method produces decision rules with near-optimal average performance.
arXiv Detail & Related papers (2023-03-11T23:19:32Z) - A deep learning method for solving stochastic optimal control problems driven by fully-coupled FBSDEs [1.0703175070560689]
We first transform the problem into a Stackelberg differential game problem (leader-follower problem)
We compute two examples of the investment-consumption problem solved through utility models.
The results of both examples demonstrate the effectiveness of our proposed algorithm.
arXiv Detail & Related papers (2022-04-12T13:31:19Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Learning Stochastic Optimal Policies via Gradient Descent [17.9807134122734]
We systematically develop a learning-based treatment of optimal control (SOC)
We propose a derivation of adjoint sensitivity results for differential equations through direct application of variational calculus.
We verify the performance of the proposed approach on a continuous-time, finite horizon portfolio optimization with proportional transaction costs.
arXiv Detail & Related papers (2021-06-07T16:43:07Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - GACEM: Generalized Autoregressive Cross Entropy Method for Multi-Modal
Black Box Constraint Satisfaction [69.94831587339539]
We present a modified Cross-Entropy Method (CEM) that uses a masked auto-regressive neural network for modeling uniform distributions over the solution space.
Our algorithm is able to express complicated solution spaces, thus allowing it to track a variety of different solution regions.
arXiv Detail & Related papers (2020-02-17T20:21:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.