Learning Stochastic Optimal Policies via Gradient Descent
- URL: http://arxiv.org/abs/2106.03780v1
- Date: Mon, 7 Jun 2021 16:43:07 GMT
- Title: Learning Stochastic Optimal Policies via Gradient Descent
- Authors: Stefano Massaroli, Michael Poli, Stefano Peluchetti, Jinkyoo Park,
Atsushi Yamashita and Hajime Asama
- Abstract summary: We systematically develop a learning-based treatment of optimal control (SOC)
We propose a derivation of adjoint sensitivity results for differential equations through direct application of variational calculus.
We verify the performance of the proposed approach on a continuous-time, finite horizon portfolio optimization with proportional transaction costs.
- Score: 17.9807134122734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We systematically develop a learning-based treatment of stochastic optimal
control (SOC), relying on direct optimization of parametric control policies.
We propose a derivation of adjoint sensitivity results for stochastic
differential equations through direct application of variational calculus.
Then, given an objective function for a predetermined task specifying the
desiderata for the controller, we optimize their parameters via iterative
gradient descent methods. In doing so, we extend the range of applicability of
classical SOC techniques, often requiring strict assumptions on the functional
form of system and control. We verify the performance of the proposed approach
on a continuous-time, finite horizon portfolio optimization with proportional
transaction costs.
Related papers
- A Simulation-Free Deep Learning Approach to Stochastic Optimal Control [12.699529713351287]
We propose a simulation-free algorithm for the solution of generic problems in optimal control (SOC)
Unlike existing methods, our approach does not require the solution of an adjoint problem.
arXiv Detail & Related papers (2024-10-07T16:16:53Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - End-to-End Learning for Fair Multiobjective Optimization Under
Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality.
This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives.
It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z) - Locally Optimal Descent for Dynamic Stepsize Scheduling [45.6809308002043]
We introduce a novel dynamic learning scheduling scheme grounded in theory with the goal of simplifying the manual and time-consuming tuning of schedules in phases.
Our approach is based on estimating a locally-optimal practice tuning-rate in the direction of a smooth gradient.
Our findings indicate that our method needs minimal tuning when compared to existing approaches.
arXiv Detail & Related papers (2023-11-23T09:57:35Z) - Acceleration in Policy Optimization [50.323182853069184]
We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.
We define optimism as predictive modelling of the future behavior of a policy, and adaptivity as taking immediate and anticipatory corrective actions to mitigate errors from overshooting predictions or delayed responses to change.
We design an optimistic policy gradient algorithm, adaptive via meta-gradient learning, and empirically highlight several design choices pertaining to acceleration, in an illustrative task.
arXiv Detail & Related papers (2023-06-18T15:50:57Z) - Improper Learning with Gradient-based Policy Optimization [62.50997487685586]
We consider an improper reinforcement learning setting where the learner is given M base controllers for an unknown Markov Decision Process.
We propose a gradient-based approach that operates over a class of improper mixtures of the controllers.
arXiv Detail & Related papers (2021-02-16T14:53:55Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Iterative Amortized Policy Optimization [147.63129234446197]
Policy networks are a central feature of deep reinforcement learning (RL) algorithms for continuous control.
From the variational inference perspective, policy networks are a form of textitamortized optimization, optimizing network parameters rather than the policy distributions directly.
We demonstrate that iterative amortized policy optimization, yields performance improvements over direct amortization on benchmark continuous control tasks.
arXiv Detail & Related papers (2020-10-20T23:25:42Z) - Adaptive Stochastic Optimization [1.7945141391585486]
Adaptive optimization methods have the potential to offer significant computational savings when training large-scale systems.
Modern approaches based on the gradient method are non-adaptive in the sense that their implementation employs prescribed parameter values that need to be tuned for each application.
arXiv Detail & Related papers (2020-01-18T16:30:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.