Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit
- URL: http://arxiv.org/abs/2410.08578v2
- Date: Wed, 12 Feb 2025 09:50:05 GMT
- Title: Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit
- Authors: Julien Zhou, Pierre Gaillard, Thibaud Rahier, Julyan Arbel,
- Abstract summary: We address the online un submodular problem (Online USM) in a setting with bandit feedback.
In this framework, a-maker receives noisy rewards from a monotone submodular function taking values in a known interval.
We introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and regime for the upper bounds.
- Score: 12.096516329746292
- License:
- Abstract: We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a $O(d\log(dT))$ problem-dependent upper bound for the $1/2$-approximate pseudo-regret, as well as a $O(dT^{2/3}\log(dT)^{1/3})$ problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.
Related papers
- Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback.
This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc.
Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z) - Stochastic $k$-Submodular Bandits with Full Bandit Feedback [29.705337940879705]
We present the first sublinear $alpha$-regret bounds for online $k$-submodular optimization problems with full-bandit feedback.
A key contribution of our work is analyzing the robustness of the algorithms.
arXiv Detail & Related papers (2024-12-14T05:02:53Z) - Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit.
In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z) - Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework.
To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework.
We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Bandit Multi-linear DR-Submodular Maximization and Its Applications on
Adversarial Submodular Bandits [21.54858035450694]
We give a sublinear regret algorithm for the submodular bandit with partition matroid constraint.
For the bandit sequential submodular, the existing work proves an $O(T2/3)$ regret with a suboptimal $1/2$ approximation ratio.
arXiv Detail & Related papers (2023-05-21T08:51:55Z) - Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score.
We propose a rich class of generalized linear dueling bandit models, which cover many existing models.
Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z) - Online Learning for Non-monotone Submodular Maximization: From Full
Information to Bandit Feedback [12.914842850902456]
This paper revisits the online non-monotone continuous DR-submodular problem over a down-closed convex set.
We present the Meta-MFW algorithm achieving a $1/e$-regret bound of $O(sqrtT)$.
Next, we extend Mono-MFW to the bandit setting and propose the Bandit-MFW algorithm which attains a $1/e$-regret bound of $O(T8/9)$.
arXiv Detail & Related papers (2022-08-16T09:32:37Z) - Online Nonsubmodular Minimization with Delayed Costs: From Full
Information to Bandit Feedback [98.7678704343537]
We focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms.
We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded.
arXiv Detail & Related papers (2022-05-15T08:27:12Z) - Submodular Bandit Problem Under Multiple Constraints [8.100450025624443]
We introduce a submodular bandit problem under the intersection of $l$ knapsacks and a $k$-system constraint.
To solve this problem, we propose a non-greedy algorithm that adaptively focuses on a standard or modified upper-confidence bound.
We provide a high-probability upper bound of an approximation regret, where the approximation ratio matches that of a fast algorithm.
arXiv Detail & Related papers (2020-06-01T01:28:44Z) - Online DR-Submodular Maximization with Stochastic Cumulative Constraints [17.660958043781154]
We consider online continuous DR-submodular with linear long-term constraints.
Online Lagrangian Frank-Wolfe (OLFW) algorithm to solve this class of online problems.
arXiv Detail & Related papers (2020-05-29T17:55:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.