Online Linear Programming with Batching
- URL: http://arxiv.org/abs/2408.00310v1
- Date: Thu, 1 Aug 2024 06:13:24 GMT
- Title: Online Linear Programming with Batching
- Authors: Haoran Xu, Peter W. Glynn, Yinyu Ye,
- Abstract summary: We study Online Linear Programming (OLP) withOmega.
We show that the ability to delay decisions brings better operational performance, as measured by regret.
All the proposed algorithms delay decisions on customers arriving in only the first and the last batch.
- Score: 18.989352151219336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study Online Linear Programming (OLP) with batching. The planning horizon is cut into $K$ batches, and the decisions on customers arriving within a batch can be delayed to the end of their associated batch. Compared with OLP without batching, the ability to delay decisions brings better operational performance, as measured by regret. Two research questions of interest are: (1) What is a lower bound of the regret as a function of $K$? (2) What algorithms can achieve the regret lower bound? These questions have been analyzed in the literature when the distribution of the reward and the resource consumption of the customers have finite support. By contrast, this paper analyzes these questions when the conditional distribution of the reward given the resource consumption is continuous, and we show the answers are different under this setting. When there is only a single type of resource and the decision maker knows the total number of customers, we propose an algorithm with a $O(\log K)$ regret upper bound and provide a $\Omega(\log K)$ regret lower bound. We also propose algorithms with $O(\log K)$ regret upper bound for the setting in which there are multiple types of resource and the setting in which customers arrive following a Poisson process. All these regret upper and lower bounds are independent of the length of the planning horizon, and all the proposed algorithms delay decisions on customers arriving in only the first and the last batch. We also take customer impatience into consideration and establish a way of selecting an appropriate batch size.
Related papers
- Infrequent Resolving Algorithm for Online Linear Programming [3.247415064064425]
Existing online linear programming (OLP) algorithms fall into two categories: LP-based algorithms and LP-free algorithms.
We propose a well-performing algorithm, that solves LPs at a few selected time points and conducts first-order computations at other time points.
Our work highlights the value of resolving both at the beginning and the end of the selling horizon, and provides a novel framework to prove the performance guarantee.
arXiv Detail & Related papers (2024-08-01T11:09:01Z) - Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$.
We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed.
We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z) - Contextual Decision-Making with Knapsacks Beyond the Worst Case [5.65888994172721]
We study the framework of a dynamic decision-making scenario with resource constraints.
In this framework, an agent selects an action in each round upon observing a random request.
We prove that our algorithm maintains a near-optimal $widetildeO(sqrtT)$ regret even in the worst cases.
arXiv Detail & Related papers (2022-11-25T08:21:50Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Efficient Algorithms for Planning with Participation Constraints [74.74967476995572]
We consider the problem of planning with participation constraints introduced in [Zhang et al., 2022]
In this problem, a principal chooses actions in a decision process, resulting in separate utilities for the principal and the agent.
We provide the first-time exact algorithm for this problem for finite-horizon settings, where previously only an additive $varepsilon$-approximation was known.
arXiv Detail & Related papers (2022-05-16T15:47:41Z) - Efficient and Optimal Algorithms for Contextual Dueling Bandits under
Realizability [59.81339109121384]
We study the $K$ contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes emphpreference-based feedback suggesting that one decision was better than the other.
We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works.
arXiv Detail & Related papers (2021-11-24T07:14:57Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Adapting to Delays and Data in Adversarial Multi-Armed Bandits [7.310043452300736]
We analyze variants of the Exp3 algorithm that tune their step-size using only information available at the time of the decisions.
We obtain regret guarantees that adapt to the observed (rather than the worst-case) sequences of delays and/or losses.
arXiv Detail & Related papers (2020-10-12T20:53:52Z) - An Efficient Algorithm for Cooperative Semi-Bandits [0.0]
We introduce Coop-FTPL, a cooperative version of the well-known Follow The Perturbed Leader algorithm.
We show that the expected regret of our algorithm after T time steps is of order QT log(k)(k$alpha$ 1 /Q + m), where Q is the total activation probability mass.
arXiv Detail & Related papers (2020-10-05T07:08:26Z) - Continuous-Time Multi-Armed Bandits with Controlled Restarts [32.63624728528415]
We investigate the bandit problem with controlled restarts for time-constrained decision processes.
In particular, we consider a bandit setting where each decision takes a random completion time, and yields a random and correlated reward at the end.
We develop efficient online learning algorithms with $O(log(tau))$ and $O(sqrttaulog(tau))$ regret in a finite and continuous action space of restart strategies.
arXiv Detail & Related papers (2020-06-30T19:50:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.