Related papers: Optimal Control of Fluid Restless Multi-armed Bandits: A Machine Learning Approach

Optimal Control of Fluid Restless Multi-armed Bandits: A Machine Learning Approach

URL: http://arxiv.org/abs/2502.03725v1
Date: Thu, 06 Feb 2025 02:34:36 GMT
Title: Optimal Control of Fluid Restless Multi-armed Bandits: A Machine Learning Approach
Authors: Dimitris Bertsimas, Cheol Woo Kim, José Niño-Mora,
Abstract summary: We propose a machine learning approach to the optimal control of fluid restless multi-armed bandits (FRMABs)<n>By deriving fundamental properties of FRMAB problems, we design an efficient machine learning based algorithm.<n>Our method yields high-quality state feedback policies and achieves a speed-up of up to 26 million times compared to a direct numerical algorithm for fluid problems.
Score: 5.22980614912553
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a machine learning approach to the optimal control of fluid restless multi-armed bandits (FRMABs) with state equations that are either affine or quadratic in the state variables. By deriving fundamental properties of FRMAB problems, we design an efficient machine learning based algorithm. Using this algorithm, we solve multiple instances with varying initial states to generate a comprehensive training set. We then learn a state feedback policy using Optimal Classification Trees with hyperplane splits (OCT-H). We test our approach on machine maintenance, epidemic control and fisheries control problems. Our method yields high-quality state feedback policies and achieves a speed-up of up to 26 million times compared to a direct numerical algorithm for fluid problems.

Related papers

Stochastic Optimal Control Matching [53.156277491861985]
Our work introduces Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for optimal control. The control is learned via a least squares problem by trying to fit a matching vector field. Experimentally, our algorithm achieves lower error than all the existing IDO techniques for optimal control.
arXiv Detail & Related papers (2023-12-04T16:49:43Z)
Interpretable Option Discovery using Deep Q-Learning and Variational Autoencoders [9.432068833600884]
The DVQN algorithm is a promising approach for identifying initiation and termination conditions for option-based reinforcement learning. Experiments show that the DVQN algorithm, with automatic initiation and termination, has comparable performance to Rainbow.
arXiv Detail & Related papers (2022-10-03T21:08:39Z)
Comparative analysis of machine learning methods for active flow control [60.53767050487434]
Genetic Programming (GP) and Reinforcement Learning (RL) are gaining popularity in flow control. This work presents a comparative analysis of the two, bench-marking some of their most representative algorithms against global optimization techniques.
arXiv Detail & Related papers (2022-02-23T18:11:19Z)
Fast Block Linear System Solver Using Q-Learning Schduling for Unified Dynamic Power System Simulations [2.1509980377118767]
This solver uses a novel Q-learning based method for task scheduling. The simulation on some large power systems shows that our solver is 2-6 times faster than KLU.
arXiv Detail & Related papers (2021-10-12T09:10:27Z)
Continuous-Time Fitted Value Iteration for Robust Policies [93.25997466553929]
Solving the Hamilton-Jacobi-Bellman equation is important in many domains including control, robotics and economics. We propose continuous fitted value iteration (cFVI) and robust fitted value iteration (rFVI) These algorithms leverage the non-linear control-affine dynamics and separable state and action reward of many continuous control problems.
arXiv Detail & Related papers (2021-10-05T11:33:37Z)
Machine Learning for Online Algorithm Selection under Censored Feedback [71.6879432974126]
In online algorithm selection (OAS), instances of an algorithmic problem class are presented to an agent one after another, and the agent has to quickly select a presumably best algorithm from a fixed set of candidate algorithms. For decision problems such as satisfiability (SAT), quality typically refers to the algorithm's runtime. In this work, we revisit multi-armed bandit algorithms for OAS and discuss their capability of dealing with the problem. We adapt them towards runtime-oriented losses, allowing for partially censored data while keeping a space- and time-complexity independent of the time horizon.
arXiv Detail & Related papers (2021-09-13T18:10:52Z)
Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain. We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z)
Assessment of machine learning methods for state-to-state approaches [0.0]
We investigate the possibilities offered by the use of machine learning methods for state-to-state approaches. Deep neural networks appear to be a viable technology also for these tasks.
arXiv Detail & Related papers (2021-04-02T13:27:23Z)
Efficient Automatic CASH via Rising Bandits [37.09843193057032]
We propose an alternating optimization framework for CASH problem. We also introduce Rising Bandits, a CASH-oriented Multi-Armed Bandits (MAB) variant, to model the algorithm selection in CASH. This framework can take the advantages of both BO in solving the HPO problem and the MABs in accelerating the algorithm selection.
arXiv Detail & Related papers (2020-12-08T11:29:57Z)
Combining Deep Learning and Optimization for Security-Constrained Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems. Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs. This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.