Related papers: Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions

Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions

URL: http://arxiv.org/abs/2101.07957v2
Date: Sat, 27 Feb 2021 11:22:30 GMT
Title: Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions
Authors: Kei Takemura, Shinji Ito, Daisuke Hatano, Hanna Sumita, Takuro Fukunaga, Naonori Kakimura, Ken-ichi Kawarabayashi
Abstract summary: We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
Score: 53.77572276969548
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The contextual combinatorial semi-bandit problem with linear payoff functions is a decision-making problem in which a learner chooses a set of arms with the feature vectors in each round under given constraints so as to maximize the sum of rewards of arms. Several existing algorithms have regret bounds that are optimal with respect to the number of rounds $T$. However, there is a gap of $\tilde{O}(\max(\sqrt{d}, \sqrt{k}))$ between the current best upper and lower bounds, where $d$ is the dimension of the feature vectors, $k$ is the number of the chosen arms in a round, and $\tilde{O}(\cdot)$ ignores the logarithmic factors. The dependence of $k$ and $d$ is of practical importance because $k$ may be larger than $T$ in real-world applications such as recommender systems. In this paper, we fill the gap by improving the upper and lower bounds. More precisely, we show that the C${}^2$UCB algorithm proposed by Qin, Chen, and Zhu (2014) has the optimal regret bound $\tilde{O}(d\sqrt{kT} + dk)$ for the partition matroid constraints. For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C${}^2$UCB algorithm and demonstrate that it enjoys the optimal regret bound for a more general problem that can take into account other objectives simultaneously. We also show that our technique would be applicable to related problems. Numerical experiments support our theoretical results and considerations.

Related papers

Improved Best-of-Both-Worlds Regret for Bandits with Delayed Feedback [40.86836752251116]
We study the multi-armed bandit problem with adversarially chosen delays in the Best-Both-Worlds (BoBW) framework.<n>Our main contribution is a new algorithm that matches the known lower bounds in each setting individually.<n>We provide a regret bound which scale as $sum_i>0left(log T/Delta_iright) + frac1Ksum Delta_i sigma_max$, where $Delta_i$ is the sub-optimality gap of arm $i$ and $
arXiv Detail & Related papers (2025-05-30T04:05:52Z)
Follow-the-Perturbed-Leader Approaches Best-of-Both-Worlds for the m-Set Semi-Bandit Problems [20.47953854427799]
We consider a common case of the semi-bandit problem, where the learner exactly selects $m$ arms from the total $d$ arms. We show that FTPL with a Fr'echet perturbation also enjoys the near optimal regret bound $mathcalO(sqrtnmdlog(d))$ in the adversarial setting.
arXiv Detail & Related papers (2025-04-09T22:07:01Z)
LC-Tsallis-INF: Generalized Best-of-Both-Worlds Linear Contextual Bandits [38.41164102066483]
This study considers the linear contextual bandit problem with independent and identically distributed contexts. Our proposed algorithm is based on the Follow-The-Regularized-Leader with the Tsallis entropy and referred to as the $alpha$-textual-Con (LC)-Tsallis-INF.
arXiv Detail & Related papers (2024-03-05T18:59:47Z)
Combinatorial Bandits for Maximum Value Reward Function under Max Value-Index Feedback [9.771002043127728]
We consider a multi-armed bandit problem for maximum value reward function under maximum value and index feedback. We propose an algorithm and provide a regret bound for problem instances with arm outcomes according to arbitrary distributions with finite supports. Our algorithm achieves a $O((k/Delta)log(T))$ distribution-dependent and a $tildeO(sqrtT)$ distribution-independent regret.
arXiv Detail & Related papers (2023-05-25T14:02:12Z)
Tight Bounds for $\gamma$-Regret via the Decision-Estimation Coefficient [88.86699022151598]
We give a statistical characterization of the $gamma$-regret for arbitrary structured bandit problems. The $gamma$-regret emerges in structured bandit problems over a function class $mathcalF$ where finding an exact optimum of $f in mathcalF$ is intractable.
arXiv Detail & Related papers (2023-03-06T17:54:33Z)
Bandits with many optimal arms [68.17472536610859]
We write $p*$ for the proportion of optimal arms and $Delta$ for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting.
arXiv Detail & Related papers (2021-03-23T11:02:31Z)
Combinatorial Bandits without Total Order for Arms [52.93972547896022]
We present a reward model that captures set-dependent reward distribution and assumes no total order for arms. We develop a novel regret analysis and show an $Oleft(frack2 n log Tepsilonright)$ gap-dependent regret bound as well as an $Oleft(k2sqrtn T log Tright)$ gap-independent regret bound.
arXiv Detail & Related papers (2021-03-03T23:08:59Z)
An Efficient Pessimistic-Optimistic Algorithm for Constrained Linear Bandits [16.103685478563072]
This paper considers linear bandits with general constraints. The objective is to maximize the expected cumulative reward over horizon $T$ subject to a set of constraints in each round. We propose a pessimistic-optimistic algorithm for this problem, which is efficient in two aspects.
arXiv Detail & Related papers (2021-02-10T07:30:37Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
Maximizing Determinants under Matroid Constraints [69.25768526213689]
We study the problem of finding a basis $S$ of $M$ such that $det(sum_i in Sv_i v_i v_itop)$ is maximized. This problem appears in a diverse set of areas such as experimental design, fair allocation of goods, network design, and machine learning.
arXiv Detail & Related papers (2020-04-16T19:16:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.