An Efficient On-Policy Deep Learning Framework for Stochastic Optimal Control
- URL: http://arxiv.org/abs/2410.05163v3
- Date: Tue, 13 May 2025 03:30:44 GMT
- Title: An Efficient On-Policy Deep Learning Framework for Stochastic Optimal Control
- Authors: Mengjian Hua, Mathieu Laurière, Eric Vanden-Eijnden,
- Abstract summary: We present a novel on-policy algorithm for solving optimal control (SOC) problems.<n>By leveraging the Girsanov theorem, our method directly computes on-policy gradients of the SOC objective without expensive backpropagation through differential equations or adjoint problem solutions.<n> Experimental results demonstrate substantial improvements in both computational speed and memory efficiency compared to existing approaches.
- Score: 14.832859803172846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel on-policy algorithm for solving stochastic optimal control (SOC) problems. By leveraging the Girsanov theorem, our method directly computes on-policy gradients of the SOC objective without expensive backpropagation through stochastic differential equations or adjoint problem solutions. This approach significantly accelerates the optimization of neural network control policies while scaling efficiently to high-dimensional problems and long time horizons. We evaluate our method on classical SOC benchmarks as well as applications to sampling from unnormalized distributions via Schr\"odinger-F\"ollmer processes and fine-tuning pre-trained diffusion models. Experimental results demonstrate substantial improvements in both computational speed and memory efficiency compared to existing approaches.
Related papers
- Preference Optimization for Combinatorial Optimization Problems [54.87466279363487]
Reinforcement Learning (RL) has emerged as a powerful tool for neural optimization, enabling models learns that solve complex problems without requiring expert knowledge.<n>Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast action spaces.<n>We propose Preference Optimization, a novel method that transforms quantitative reward signals into qualitative preference signals via statistical comparison modeling.
arXiv Detail & Related papers (2025-05-13T16:47:00Z) - Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We study optimization methods that leverage the linear minimization oracle (LMO) over a norm-ball.
We propose a new family of algorithms that uses the LMO to adapt to the geometry of the problem and, perhaps surprisingly, show that they can be applied to unconstrained problems.
arXiv Detail & Related papers (2025-02-11T13:10:34Z) - Go With the Flow: Fast Diffusion for Gaussian Mixture Models [13.03355083378673]
Schr"odinger Bridges (SB) are diffusion processes that steer, in finite time, a given initial distribution to another final one while minimizing a suitable cost functional.
We propose latentmetrization of a set of SB policies for steering a system from one distribution to another.
We showcase the potential this approach in low-to-dimensional problems such as image-to-image translation in the space of an autoencoder.
arXiv Detail & Related papers (2024-12-12T08:40:22Z) - Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Fast Two-Time-Scale Stochastic Gradient Method with Applications in Reinforcement Learning [5.325297567945828]
We propose a new method for two-time-scale optimization that achieves significantly faster convergence than the prior arts.<n>We show how the proposed algorithm specializes to novel online sample-based methods that surpass or match the performance of the existing state of the art.
arXiv Detail & Related papers (2024-05-15T19:03:08Z) - Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Process [11.01014302314467]
We focus on optimizing deep neural networks parameterized differential equations (SDEs)
We propose constraining the SDE to be consistent with its associated perturbation process.
Our framework offers a versatile selection of policy gradient methods to effectively and efficiently train SDEs.
arXiv Detail & Related papers (2024-03-07T02:24:45Z) - Smoothing Methods for Automatic Differentiation Across Conditional
Branches [0.0]
Smooth interpretation (SI) approximates the convolution of a program's output with a Gaussian kernel, thus smoothing its output in a principled manner.
We combine SI with automatic differentiation (AD) to efficiently compute gradients of smoothed programs.
We propose a novel Monte Carlo estimator that avoids the underlying assumptions by estimating the smoothed programs' gradients through a combination of AD and sampling.
arXiv Detail & Related papers (2023-10-05T15:08:37Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - An Optimization-based Deep Equilibrium Model for Hyperspectral Image
Deconvolution with Convergence Guarantees [71.57324258813675]
We propose a novel methodology for addressing the hyperspectral image deconvolution problem.
A new optimization problem is formulated, leveraging a learnable regularizer in the form of a neural network.
The derived iterative solver is then expressed as a fixed-point calculation problem within the Deep Equilibrium framework.
arXiv Detail & Related papers (2023-06-10T08:25:16Z) - Multistage Stochastic Optimization via Kernels [3.7565501074323224]
We develop a non-parametric, data-driven, tractable approach for solving multistage optimization problems.
We show that the proposed method produces decision rules with near-optimal average performance.
arXiv Detail & Related papers (2023-03-11T23:19:32Z) - A deep learning method for solving stochastic optimal control problems driven by fully-coupled FBSDEs [1.0703175070560689]
We first transform the problem into a Stackelberg differential game problem (leader-follower problem)
We compute two examples of the investment-consumption problem solved through utility models.
The results of both examples demonstrate the effectiveness of our proposed algorithm.
arXiv Detail & Related papers (2022-04-12T13:31:19Z) - Optimization on manifolds: A symplectic approach [127.54402681305629]
We propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems.
Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.
arXiv Detail & Related papers (2021-07-23T13:43:34Z) - Learning Stochastic Optimal Policies via Gradient Descent [17.9807134122734]
We systematically develop a learning-based treatment of optimal control (SOC)
We propose a derivation of adjoint sensitivity results for differential equations through direct application of variational calculus.
We verify the performance of the proposed approach on a continuous-time, finite horizon portfolio optimization with proportional transaction costs.
arXiv Detail & Related papers (2021-06-07T16:43:07Z) - Gaussian Process-based Min-norm Stabilizing Controller for
Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem.
We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Variance-Reduced Off-Policy Memory-Efficient Policy Search [61.23789485979057]
Off-policy policy optimization is a challenging problem in reinforcement learning.
Off-policy algorithms are memory-efficient and capable of learning from off-policy samples.
arXiv Detail & Related papers (2020-09-14T16:22:46Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z) - IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method [64.15649345392822]
We introduce a framework for designing primal methods under the decentralized optimization setting where local functions are smooth and strongly convex.
Our approach consists of approximately solving a sequence of sub-problems induced by the accelerated augmented Lagrangian method.
When coupled with accelerated gradient descent, our framework yields a novel primal algorithm whose convergence rate is optimal and matched by recently derived lower bounds.
arXiv Detail & Related papers (2020-06-11T18:49:06Z) - GACEM: Generalized Autoregressive Cross Entropy Method for Multi-Modal
Black Box Constraint Satisfaction [69.94831587339539]
We present a modified Cross-Entropy Method (CEM) that uses a masked auto-regressive neural network for modeling uniform distributions over the solution space.
Our algorithm is able to express complicated solution spaces, thus allowing it to track a variety of different solution regions.
arXiv Detail & Related papers (2020-02-17T20:21:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.