Online Linear Programming with Batching
- URL: http://arxiv.org/abs/2408.00310v1
- Date: Thu, 1 Aug 2024 06:13:24 GMT
- Title: Online Linear Programming with Batching
- Authors: Haoran Xu, Peter W. Glynn, Yinyu Ye,
- Abstract summary: We study Online Linear Programming (OLP) withOmega.
We show that the ability to delay decisions brings better operational performance, as measured by regret.
All the proposed algorithms delay decisions on customers arriving in only the first and the last batch.
- Score: 18.989352151219336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study Online Linear Programming (OLP) with batching. The planning horizon is cut into $K$ batches, and the decisions on customers arriving within a batch can be delayed to the end of their associated batch. Compared with OLP without batching, the ability to delay decisions brings better operational performance, as measured by regret. Two research questions of interest are: (1) What is a lower bound of the regret as a function of $K$? (2) What algorithms can achieve the regret lower bound? These questions have been analyzed in the literature when the distribution of the reward and the resource consumption of the customers have finite support. By contrast, this paper analyzes these questions when the conditional distribution of the reward given the resource consumption is continuous, and we show the answers are different under this setting. When there is only a single type of resource and the decision maker knows the total number of customers, we propose an algorithm with a $O(\log K)$ regret upper bound and provide a $\Omega(\log K)$ regret lower bound. We also propose algorithms with $O(\log K)$ regret upper bound for the setting in which there are multiple types of resource and the setting in which customers arrive following a Poisson process. All these regret upper and lower bounds are independent of the length of the planning horizon, and all the proposed algorithms delay decisions on customers arriving in only the first and the last batch. We also take customer impatience into consideration and establish a way of selecting an appropriate batch size.
Related papers
- Autoregressive Bandits [58.46584210388307]
We propose a novel online learning setting, Autoregressive Bandits, in which the observed reward is governed by an autoregressive process of order $k$.
We show that, under mild assumptions on the reward process, the optimal policy can be conveniently computed.
We then devise a new optimistic regret minimization algorithm, namely, AutoRegressive Upper Confidence Bound (AR-UCB), that suffers sublinear regret of order $widetildemathcalO left( frac(k+1)3/2sqrtnT (1-G
arXiv Detail & Related papers (2022-12-12T21:37:36Z) - A Unifying Framework for Online Optimization with Long-Term Constraints [62.35194099438855]
We study online learning problems in which a decision maker has to take a sequence of decisions subject to $m$ long-term constraints.
The goal is to maximize their total reward, while at the same time achieving small cumulative violation across the $T$ rounds.
We present the first best-of-both-world type algorithm for this general class problems, with no-regret guarantees both in the case in which rewards and constraints are selected according to an unknown model, and in the case in which they are selected at each round by an adversary.
arXiv Detail & Related papers (2022-09-15T16:59:19Z) - Learning to Order for Inventory Systems with Lost Sales and Uncertain
Supplies [21.690446677016247]
We consider a lost-sales inventory control system with a lead time $L$ over a planning horizon $T$. Supply is uncertain, and is a function of the order quantity.
We show that our algorithm achieves a regret (i.e. the performance gap between the cost of our algorithm and that of an optimal policy over $T$ periods) of $O(L+sqrtT)$ when $Lgeqlog(T)$.
arXiv Detail & Related papers (2022-07-10T22:11:32Z) - Efficient Algorithms for Planning with Participation Constraints [74.74967476995572]
We consider the problem of planning with participation constraints introduced in [Zhang et al., 2022]
In this problem, a principal chooses actions in a decision process, resulting in separate utilities for the principal and the agent.
We provide the first-time exact algorithm for this problem for finite-horizon settings, where previously only an additive $varepsilon$-approximation was known.
arXiv Detail & Related papers (2022-05-16T15:47:41Z) - Online Allocation with Two-sided Resource Constraints [44.5635910908944]
We consider an online allocation problem subject to lower and upper resource constraints, where the requests arrive sequentially.
We propose a new algorithm that obtains $1-O(fracepsilonalpha-epsilon)$ -competitive ratio for the offline problems that know the entire requests ahead of time.
arXiv Detail & Related papers (2021-12-28T02:21:06Z) - Efficient and Optimal Algorithms for Contextual Dueling Bandits under
Realizability [59.81339109121384]
We study the $K$ contextual dueling bandit problem, a sequential decision making setting in which the learner uses contextual information to make two decisions, but only observes emphpreference-based feedback suggesting that one decision was better than the other.
We provide a new algorithm that achieves the optimal regret rate for a new notion of best response regret, which is a strictly stronger performance measure than those considered in prior works.
arXiv Detail & Related papers (2021-11-24T07:14:57Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Adapting to Delays and Data in Adversarial Multi-Armed Bandits [7.310043452300736]
We analyze variants of the Exp3 algorithm that tune their step-size using only information available at the time of the decisions.
We obtain regret guarantees that adapt to the observed (rather than the worst-case) sequences of delays and/or losses.
arXiv Detail & Related papers (2020-10-12T20:53:52Z) - An Efficient Algorithm for Cooperative Semi-Bandits [0.0]
We introduce Coop-FTPL, a cooperative version of the well-known Follow The Perturbed Leader algorithm.
We show that the expected regret of our algorithm after T time steps is of order QT log(k)(k$alpha$ 1 /Q + m), where Q is the total activation probability mass.
arXiv Detail & Related papers (2020-10-05T07:08:26Z) - Continuous-Time Multi-Armed Bandits with Controlled Restarts [32.63624728528415]
We investigate the bandit problem with controlled restarts for time-constrained decision processes.
In particular, we consider a bandit setting where each decision takes a random completion time, and yields a random and correlated reward at the end.
We develop efficient online learning algorithms with $O(log(tau))$ and $O(sqrttaulog(tau))$ regret in a finite and continuous action space of restart strategies.
arXiv Detail & Related papers (2020-06-30T19:50:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.