Optimal Control of Agent-Based Dynamics under Deep Galerkin Feedback Laws
- URL: http://arxiv.org/abs/2406.09141v1
- Date: Thu, 13 Jun 2024 14:10:57 GMT
- Title: Optimal Control of Agent-Based Dynamics under Deep Galerkin Feedback Laws
- Authors: Frederik Kelbel,
- Abstract summary: This paper specifically investigates the sampling issues the Deep Galerkin Method is subjected to.
It proposes a drift relaxation-based sampling approach to alleviate the symptoms of high-variance policy approximations.
The resulting policies induce a significant cost reduction over manually optimised control functions and show improvements on the Linear-Quadratic Regulator problem.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ever since the concepts of dynamic programming were introduced, one of the most difficult challenges has been to adequately address high-dimensional control problems. With growing dimensionality, the utilisation of Deep Neural Networks promises to circumvent the issue of an otherwise exponentially increasing complexity. The paper specifically investigates the sampling issues the Deep Galerkin Method is subjected to. It proposes a drift relaxation-based sampling approach to alleviate the symptoms of high-variance policy approximations. This is validated on mean-field control problems; namely, the variations of the opinion dynamics presented by the Sznajd and the Hegselmann-Krause model. The resulting policies induce a significant cost reduction over manually optimised control functions and show improvements on the Linear-Quadratic Regulator problem over the Deep FBSDE approach.
Related papers
- Data-driven rules for multidimensional reflection problems [1.0742675209112622]
We study a multivariate singular control problem for reversible diffusions with controls of reflection type.
For given diffusion dynamics, assuming the optimal domain to be strongly star-shaped, we propose a gradient descent algorithm based on polytope approximations to numerically determine a cost-minimizing domain.
Finally, we investigate data-driven solutions when the diffusion dynamics are unknown to the controller.
arXiv Detail & Related papers (2023-11-11T18:36:17Z) - An Unsupervised Deep Learning Approach for the Wave Equation Inverse
Problem [12.676629870617337]
Full-waveform inversion (FWI) is a powerful geophysical imaging technique that infers high-resolution subsurface physical parameters.
Due to limitations in observation, limited shots or receivers, and random noise, conventional inversion methods are confronted with numerous challenges.
We provide an unsupervised learning approach aimed at accurately reconstructing physical velocity parameters.
arXiv Detail & Related papers (2023-11-08T08:39:33Z) - Optimizing Solution-Samplers for Combinatorial Problems: The Landscape
of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods.
Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem.
As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z) - Reimagining Demand-Side Management with Mean Field Learning [0.0]
We propose a new method for DSM, in particular the problem of controlling a large population of electrical devices to follow a desired consumption signal.
We develop a new algorithm, MD-MFC, which provides theoretical guarantees for convex and Lipschitz objective functions.
arXiv Detail & Related papers (2023-02-16T10:15:08Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - Adversarially Regularized Policy Learning Guided by Trajectory
Optimization [31.122262331980153]
We propose adVErsarially Regularized pOlicy learNIng guided by trajeCtory optimizAtion (VERONICA) for learning smooth control policies.
Our proposed approach improves the sample efficiency of neural policy learning and enhances the robustness of the policy against various types of disturbances.
arXiv Detail & Related papers (2021-09-16T00:02:11Z) - Regret-optimal Estimation and Control [52.28457815067461]
We show that the regret-optimal estimator and regret-optimal controller can be derived in state-space form.
We propose regret-optimal analogs of Model-Predictive Control (MPC) and the Extended KalmanFilter (EKF) for systems with nonlinear dynamics.
arXiv Detail & Related papers (2021-06-22T23:14:21Z) - Deep Policy Dynamic Programming for Vehicle Routing Problems [89.96386273895985]
We propose Deep Policy Dynamic Programming (D PDP) to combine the strengths of learned neurals with those of dynamic programming algorithms.
D PDP prioritizes and restricts the DP state space using a policy derived from a deep neural network, which is trained to predict edges from example solutions.
We evaluate our framework on the travelling salesman problem (TSP) and the vehicle routing problem (VRP) and show that the neural policy improves the performance of (restricted) DP algorithms.
arXiv Detail & Related papers (2021-02-23T15:33:57Z) - Escaping from Zero Gradient: Revisiting Action-Constrained Reinforcement
Learning via Frank-Wolfe Policy Optimization [5.072893872296332]
Action-constrained reinforcement learning (RL) is a widely-used approach in various real-world applications.
We propose a learning algorithm that decouples the action constraints from the policy parameter update.
We show that the proposed algorithm significantly outperforms the benchmark methods on a variety of control tasks.
arXiv Detail & Related papers (2021-02-22T14:28:03Z) - Adaptive Control and Regret Minimization in Linear Quadratic Gaussian
(LQG) Setting [91.43582419264763]
We propose LqgOpt, a novel reinforcement learning algorithm based on the principle of optimism in the face of uncertainty.
LqgOpt efficiently explores the system dynamics, estimates the model parameters up to their confidence interval, and deploys the controller of the most optimistic model.
arXiv Detail & Related papers (2020-03-12T19:56:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.