Related papers: Online Algorithms for Repeated Optimal Stopping: Achieving Both Competitive Ratio and Regret Bounds

Online Algorithms for Repeated Optimal Stopping: Achieving Both Competitive Ratio and Regret Bounds

URL: http://arxiv.org/abs/2511.04484v1
Date: Thu, 06 Nov 2025 16:04:56 GMT
Title: Online Algorithms for Repeated Optimal Stopping: Achieving Both Competitive Ratio and Regret Bounds
Authors: Tsubasa Harada, Yasushi Kawase, Hanna Sumita,
Abstract summary: We study the repeated optimal stopping problem, which generalizes the classical optimal stopping problem with an unknown distribution.<n>We aim to design algorithms that guarantee a competitive ratio in each round while also achieving sublinear regret across all rounds.<n>We demonstrate the broad applicability of our framework to canonical problems, including the prophet inequality, the secretary problem, and their variants under adversarial, random, and i.i.d. input models.
Score: 7.858819231575403
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the repeated optimal stopping problem, which generalizes the classical optimal stopping problem with an unknown distribution to a setting where the same problem is solved repeatedly over $T$ rounds. In this framework, we aim to design algorithms that guarantee a competitive ratio in each round while also achieving sublinear regret across all rounds. Our primary contribution is a general algorithmic framework that achieves these objectives simultaneously for a wide array of repeated optimal stopping problems. The core idea is to dynamically select an algorithm for each round, choosing between two candidates: (1) an empirically optimal algorithm derived from the history of observations, and (2) a sample-based algorithm with a proven competitive ratio guarantee. Based on this approach, we design an algorithm that performs no worse than the baseline sample-based algorithm in every round, while ensuring that the total regret is bounded by $\tilde{O}(\sqrt{T})$. We demonstrate the broad applicability of our framework to canonical problems, including the prophet inequality, the secretary problem, and their variants under adversarial, random, and i.i.d. input models. For example, for the repeated prophet inequality problem, our method achieves a $1/2$-competitive ratio from the second round on and an $\tilde{O}(\sqrt{T})$ regret. Furthermore, we establish a regret lower bound of $\Omega(\sqrt{T})$ even in the i.i.d. model, confirming that our algorithm's performance is almost optimal with respect to the number of rounds.

Related papers

On Universally Optimal Algorithms for A/B Testing [49.429419538826444]
We study the problem of best-arm identification with fixed budget in multi-armed bandits with Bernoulli rewards. For the problem with two arms, also known as the A/B testing problem, we prove that there is no algorithm that performs as well as the algorithm sampling each arm equally.
arXiv Detail & Related papers (2023-08-23T08:38:53Z)
Private estimation algorithms for stochastic block models and mixture models [63.07482515700984]
General tools for designing efficient private estimation algorithms. First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z)
Contextual Decision-Making with Knapsacks Beyond the Worst Case [5.65888994172721]
We study the framework of a dynamic decision-making scenario with resource constraints.<n>In this framework, an agent selects an action in each round upon observing a random request.<n>We prove that our algorithm maintains a near-optimal $widetildeO(sqrtT)$ regret even in the worst cases.
arXiv Detail & Related papers (2022-11-25T08:21:50Z)
Nearly Optimal Algorithms for Linear Contextual Bandits with Adversarial Corruptions [98.75618795470524]
We study the linear contextual bandit problem in the presence of adversarial corruption. We propose a new algorithm based on the principle of optimism in the face of uncertainty.
arXiv Detail & Related papers (2022-05-13T17:58:58Z)
Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits [27.279654173896372]
We study the problem of emphdynamic regret minimization in $K$-armed Dueling Bandits under non-stationary or time varying preferences. This is an online learning setup where the agent chooses a pair of items at each round and observes only a relative binary win-loss' feedback for this pair.
arXiv Detail & Related papers (2021-11-06T16:46:55Z)
A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure. We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem. We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z)
Corralling Stochastic Bandit Algorithms [54.10645564702416]
We show that the regret of the corralling algorithms is no worse than that of the best algorithm containing the arm with the highest reward. We show that the gap between the highest reward and other rewards depends on the gap between the highest reward and other rewards.
arXiv Detail & Related papers (2020-06-16T15:33:12Z)
Rate-adaptive model selection over a collection of black-box contextual bandit algorithms [0.966840768820136]
We consider the model selection task in the contextual bandit setting. Our proposal is the first one to be rate-adaptive for a collection of general black-box contextual bandit algorithms.
arXiv Detail & Related papers (2020-06-05T18:55:16Z)
Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms. We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z)
A New Randomized Primal-Dual Algorithm for Convex Optimization with Optimal Last Iterate Rates [16.54912614895861]
We develop a novel unified randomized block-coordinate primal-dual algorithm to solve a class of nonsmooth constrained convex optimization problems. We prove that our algorithm achieves optimal convergence rates in two cases: general convexity and strong convexity. Our results show that the proposed method has encouraging performance on different experiments.
arXiv Detail & Related papers (2020-03-03T03:59:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.