Learning to Cover: Online Learning and Optimization with Irreversible Decisions
- URL: http://arxiv.org/abs/2406.14777v1
- Date: Thu, 20 Jun 2024 23:00:25 GMT
- Title: Learning to Cover: Online Learning and Optimization with Irreversible Decisions
- Authors: Alexandre Jacquillat, Michael Lingzhi Li,
- Abstract summary: We find that regret grows sub-linearly at a rate $Thetaleft(mfrac12cdotfrac11-2-Tright)$, thus converging exponentially fast to $Theta(sqrtm)$.
These findings underscore the benefits of limited online learning and optimization, in that even a few rounds can provide significant benefits as compared to a no-learning baseline.
- Score: 50.5775508521174
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We define an online learning and optimization problem with irreversible decisions contributing toward a coverage target. At each period, a decision-maker selects facilities to open, receives information on the success of each one, and updates a machine learning model to guide future decisions. The goal is to minimize costs across a finite horizon under a chance constraint reflecting the coverage target. We derive an optimal algorithm and a tight lower bound in an asymptotic regime characterized by a large target number of facilities $m\to\infty$ but a finite horizon $T\in\mathbb{Z}_+$. We find that the regret grows sub-linearly at a rate $\Theta\left(m^{\frac{1}{2}\cdot\frac{1}{1-2^{-T}}}\right)$, thus converging exponentially fast to $\Theta(\sqrt{m})$. We establish the robustness of this result to the learning environment; we also extend it to a more complicated facility location setting in a bipartite facility-customer graph with a target on customer coverage. Throughout, constructive proofs identify a policy featuring limited exploration initially for learning purposes, and fast exploitation later on for optimization purposes once uncertainty gets mitigated. These findings underscore the benefits of limited online learning and optimization, in that even a few rounds can provide significant benefits as compared to a no-learning baseline.
Related papers
- Learning to Optimize for Mixed-Integer Non-linear Programming [20.469394148261838]
Mixed-integer non-NLP programs (MINLPs) arise in various domains, such as energy systems and transportation, but are notoriously difficult to solve.
Recent advances in machine learning have led to remarkable successes in optimization, area broadly known as learning to optimize.
We propose two differentiable correction layers that generate integer outputs while preserving gradient.
arXiv Detail & Related papers (2024-10-14T20:14:39Z) - Can Learned Optimization Make Reinforcement Learning Less Difficult? [70.5036361852812]
We consider whether learned optimization can help overcome reinforcement learning difficulties.
Our method, Learned Optimization for Plasticity, Exploration and Non-stationarity (OPEN), meta-learns an update rule whose input features and output structure are informed by previously proposed to these difficulties.
arXiv Detail & Related papers (2024-07-09T17:55:23Z) - Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning
with Linear Function Approximation [16.871660060209674]
We study the problem of deployment efficient reinforcement learning (RL) with linear function approximation under the emphreward-free exploration setting.
We propose a new algorithm that collects at most $widetildeO(fracd2H5epsilon2)$ trajectories within $H$ deployments to identify $epsilon$-optimal policy for any (possibly data-dependent) choice of reward functions.
arXiv Detail & Related papers (2022-10-03T03:48:26Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Offline Stochastic Shortest Path: Learning, Evaluation and Towards
Optimality [57.91411772725183]
In this paper, we consider the offline shortest path problem when the state space and the action space are finite.
We design the simple value-based algorithms for tackling both offline policy evaluation (OPE) and offline policy learning tasks.
Our analysis of these simple algorithms yields strong instance-dependent bounds which can imply worst-case bounds that are near-minimax optimal.
arXiv Detail & Related papers (2022-06-10T07:44:56Z) - Online Learning with Knapsacks: the Best of Both Worlds [54.28273783164608]
We casting online learning problems in which a decision maker wants to maximize their expected reward without violating a finite set of $m$m resource constraints.
Our framework allows the decision maker to handle its evidence flexibility and costoretic functions.
arXiv Detail & Related papers (2022-02-28T12:10:48Z) - Adaptive Multi-Goal Exploration [118.40427257364729]
We show how AdaGoal can be used to tackle the objective of learning an $epsilon$-optimal goal-conditioned policy.
AdaGoal is anchored in the high-level algorithmic structure of existing methods for goal-conditioned deep reinforcement learning.
arXiv Detail & Related papers (2021-11-23T17:59:50Z) - Contextual Inverse Optimization: Offline and Online Learning [3.6739949215165164]
We study the problems of offline and online contextual optimization with feedback information.
We aim to minimize regret, which is defined as the difference between our losses and the ones incurred by an all-knowing oracle.
arXiv Detail & Related papers (2021-06-26T13:09:52Z) - Refined approachability algorithms and application to regret
minimization with global costs [0.38073142980732994]
Blackwell's approachability is a framework where two players, the Decision Maker and the Environment, play a repeated game with vector-valued payoffs.
We construct and analyze a class of Follow the Regularized Leader algorithms (FTRL) for Blackwell's approachability.
This flexibility enables us to apply these algorithms to closely minimize the quantity of interest in various online learning problems.
arXiv Detail & Related papers (2020-09-08T15:54:08Z) - Unsupervised Deep Learning for Optimizing Wireless Systems with
Instantaneous and Statistic Constraints [29.823814915538463]
We establish a unified framework of using unsupervised deep learning to solve both kinds of problems with both instantaneous and statistic constraints.
We show that unsupervised learning outperforms supervised learning in terms of violation probability and approximation accuracy of the optimal policy.
arXiv Detail & Related papers (2020-05-30T13:37:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.