Bandits with Single-Peaked Preferences and Limited Resources
- URL: http://arxiv.org/abs/2510.09425v1
- Date: Fri, 10 Oct 2025 14:27:25 GMT
- Title: Bandits with Single-Peaked Preferences and Limited Resources
- Authors: Gur Keinan, Rotem Torkan, Omer Ben-Porat,
- Abstract summary: We study an online matching problem in which an algorithm sequentially matches $U$ users to $K$ arms.<n>Without structural assumptions, computing the optimal matching is NP-hard, making online learning computationally infeasible.<n>We devise an efficient algorithm for the offline budgeted matching problem, and leverage it into an efficient online algorithm with a regret of $tilde O(UKT2/3)$.
- Score: 6.205308371824035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study an online stochastic matching problem in which an algorithm sequentially matches $U$ users to $K$ arms, aiming to maximize cumulative reward over $T$ rounds under budget constraints. Without structural assumptions, computing the optimal matching is NP-hard, making online learning computationally infeasible. To overcome this barrier, we focus on \emph{single-peaked preferences} -- a well-established structure in social choice theory, where users' preferences are unimodal with respect to a common order over arms. We devise an efficient algorithm for the offline budgeted matching problem, and leverage it into an efficient online algorithm with a regret of $\tilde O(UKT^{2/3})$. Our approach relies on a novel PQ tree-based order approximation method. If the single-peaked structure is known, we develop an efficient UCB-like algorithm that achieves a regret bound of $\tilde O(U\sqrt{TK})$.
Related papers
- Online Algorithms with Unreliable Guidance [6.891896330885501]
This paper introduces a new model for ML-augmented online decision making, called online algorithms with unreliable guidance (OAG)<n>Formulated through the lens of request-answer games, an OAG algorithm receives, with each incoming request, a piece of guidance which is taken from the problem's answer space.<n>We describe a systematic method, called the drop or trust blindly (DTB) compiler, which transforms any online algorithm into a learning-augmented online algorithm in the OAG model.
arXiv Detail & Related papers (2026-02-24T09:11:56Z) - Batched Stochastic Matching Bandits [43.651070266360954]
We introduce a novel bandit framework for matching based on the Multi-nomial Logit (MNL) choice model.<n>In our setting, $N$ agents on one side are assigned to $K$ arms on the other side, where each armally selects an agent from its assigned pool according to an unknown preference.<n>The objective is to minimize regret by maximizing the cumulative revenue from successful matches across all agents.
arXiv Detail & Related papers (2025-09-04T13:16:32Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - A Framework for Adapting Offline Algorithms to Solve Combinatorial
Multi-Armed Bandit Problems with Bandit Feedback [27.192028744078282]
We provide a framework for adapting discrete offline approximation algorithms into sublinear $alpha$-regret methods.
The proposed framework is applied to diverse applications in submodular horizon.
arXiv Detail & Related papers (2023-01-30T23:18:06Z) - Private Online Prediction from Experts: Separations and Faster Rates [74.52487417350221]
Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints.
We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries.
arXiv Detail & Related papers (2022-10-24T18:40:19Z) - An Improved Algorithm For Online Min-Sum Set Cover [0.45687771576879593]
We study a model of online preference aggregation, where an algorithm maintains an ordered list of $n$ elements.
We construct a computationally efficient randomized algorithm whose competitive ratio (ALG-to-OPT cost ratio) is $O(r2)$.
This is the first algorithm whose ratio does not depend on $n$: the previously best algorithm for this problem was $O(r3/2 cdot sqrtn)$-competitive and $Omega(r)$ is a lower bound on the performance of any deterministic online algorithm.
arXiv Detail & Related papers (2022-09-11T14:03:51Z) - New Projection-free Algorithms for Online Convex Optimization with
Adaptive Regret Guarantees [21.30065439295409]
We present new efficient textitprojection-free algorithms for online convex optimization (OCO)
Our algorithms are based on the textitonline gradient descent algorithm with a novel and efficient approach to computing so-called textitinfeasible projections
We present algorithms which, using overall $O(T)$ calls to the separation oracle, guarantee $O(sqrtT)$ adaptive regret and $O(T3/4)$ adaptive expected regret.
arXiv Detail & Related papers (2022-02-09T20:56:16Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z) - Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment [7.4288915613206505]
We study differentially private online learning problems in a environment under both bandit and full information feedback.
For differentially private bandits, we propose both UCB and Thompson Sampling-based algorithms that are anytime and achieve the optimal $O left(sum_j: Delta_j>0 fracln(T)min leftDelta_j, epsilon right right)$ minimax lower bound.
For the same differentially private full information setting, we also present an $epsilon$-differentially
arXiv Detail & Related papers (2021-02-16T02:48:16Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z) - Upper Confidence Bounds for Combining Stochastic Bandits [52.10197476419621]
We provide a simple method to combine bandit algorithms.
Our approach is based on a "meta-UCB" procedure that treats each of $N$ individual bandit algorithms as arms in a higher-level $N$-armed bandit problem.
arXiv Detail & Related papers (2020-12-24T05:36:29Z) - Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms.
We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.