Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit
- URL: http://arxiv.org/abs/2410.08578v2
- Date: Wed, 12 Feb 2025 09:50:05 GMT
- Title: Logarithmic Regret for Unconstrained Submodular Maximization Stochastic Bandit
- Authors: Julien Zhou, Pierre Gaillard, Thibaud Rahier, Julyan Arbel,
- Abstract summary: We address the online un submodular problem (Online USM) in a setting with bandit feedback.<n>In this framework, a-maker receives noisy rewards from a monotone submodular function taking values in a known interval.<n>We introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and regime for the upper bounds.
- Score: 12.096516329746292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the online unconstrained submodular maximization problem (Online USM), in a setting with stochastic bandit feedback. In this framework, a decision-maker receives noisy rewards from a non monotone submodular function taking values in a known bounded interval. This paper proposes Double-Greedy - Explore-then-Commit (DG-ETC), adapting the Double-Greedy approach from the offline and online full-information settings. DG-ETC satisfies a $O(d\log(dT))$ problem-dependent upper bound for the $1/2$-approximate pseudo-regret, as well as a $O(dT^{2/3}\log(dT)^{1/3})$ problem-free one at the same time, outperforming existing approaches. In particular, we introduce a problem-dependent notion of hardness characterizing the transition between logarithmic and polynomial regime for the upper bounds.
Related papers
- Continuous K-Max Bandits [54.21533414838677]
We study the $K$-Max multi-armed bandits problem with continuous outcome distributions and weak value-index feedback.
This setting captures critical applications in recommendation systems, distributed computing, server scheduling, etc.
Our key contribution is the computationally efficient algorithm DCK-UCB, which combines adaptive discretization with bias-corrected confidence bounds.
arXiv Detail & Related papers (2025-02-19T06:37:37Z) - Stochastic $k$-Submodular Bandits with Full Bandit Feedback [29.705337940879705]
We present the first sublinear $alpha$-regret bounds for online $k$-submodular optimization problems with full-bandit feedback.
A key contribution of our work is analyzing the robustness of the algorithms.
arXiv Detail & Related papers (2024-12-14T05:02:53Z) - Federated Combinatorial Multi-Agent Multi-Armed Bandits [79.1700188160944]
This paper introduces a federated learning framework tailored for online optimization with bandit.
In this setting, agents subsets of arms, observe noisy rewards for these subsets without accessing individual arm information, and can cooperate and share information at specific intervals.
arXiv Detail & Related papers (2024-05-09T17:40:09Z) - Bridging the Gap Between General and Down-Closed Convex Sets in
Submodular Maximization [8.225819874406238]
We show that Mualem citemualem23re shows that this approach cannot smooth between down- and non-down-closed constraints.
In this work, we suggest novel offline and online algorithms based on a natural decomposition of the body into two distinct convex bodies.
We also empirically demonstrate the superiority of our proposed algorithms across three offline and two online applications.
arXiv Detail & Related papers (2024-01-17T14:56:42Z) - Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework.
To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework.
We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z) - Combinatorial Stochastic-Greedy Bandit [79.1700188160944]
We propose a novelgreedy bandit (SGB) algorithm for multi-armed bandit problems when no extra information other than the joint reward of the selected set of $n$ arms at each time $tin [T]$ is observed.
SGB adopts an optimized-explore-then-commit approach and is specifically designed for scenarios with a large set of base arms.
arXiv Detail & Related papers (2023-12-13T11:08:25Z) - Sub-linear Regret in Adaptive Model Predictive Control [56.705978425244496]
We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online oracle that combines the certainty-equivalence principle and polytopic tubes.
We analyze the regret of the algorithm, when compared to an algorithm initially aware of the system dynamics.
arXiv Detail & Related papers (2023-10-07T15:07:10Z) - Meta-Learning Adversarial Bandit Algorithms [55.72892209124227]
We study online meta-learning with bandit feedback.
We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
arXiv Detail & Related papers (2023-07-05T13:52:10Z) - Bandit Multi-linear DR-Submodular Maximization and Its Applications on
Adversarial Submodular Bandits [21.54858035450694]
We give a sublinear regret algorithm for the submodular bandit with partition matroid constraint.
For the bandit sequential submodular, the existing work proves an $O(T2/3)$ regret with a suboptimal $1/2$ approximation ratio.
arXiv Detail & Related papers (2023-05-21T08:51:55Z) - Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score.
We propose a rich class of generalized linear dueling bandit models, which cover many existing models.
Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z) - Randomized Greedy Learning for Non-monotone Stochastic Submodular
Maximization Under Full-bandit Feedback [98.29086113546045]
We investigate the problem of unconstrained multi-armed bandits with full-bandit feedback and rewards for submodularity.
We show that RGL empirically outperforms other full-bandit variants in submodular and non-submodular settings.
arXiv Detail & Related papers (2023-02-02T18:52:14Z) - Online Learning for Non-monotone Submodular Maximization: From Full
Information to Bandit Feedback [12.914842850902456]
This paper revisits the online non-monotone continuous DR-submodular problem over a down-closed convex set.
We present the Meta-MFW algorithm achieving a $1/e$-regret bound of $O(sqrtT)$.
Next, we extend Mono-MFW to the bandit setting and propose the Bandit-MFW algorithm which attains a $1/e$-regret bound of $O(T8/9)$.
arXiv Detail & Related papers (2022-08-16T09:32:37Z) - Supermodular $\mf$-divergences and bounds on lossy compression and
generalization error with mutual $\mf$-information [17.441807469515254]
We introduce super-modular $mf$-divergences and provide three applications for them.
We provide a connection between the generalization error of algorithms with bounded input/output mutual $mf$-information and a generalized rate-distortion problem.
Our bound is based on a new lower bound on the rate-distortion function that strictly improves over previously best-known bounds.
arXiv Detail & Related papers (2022-06-21T09:17:06Z) - Online Nonsubmodular Minimization with Delayed Costs: From Full
Information to Bandit Feedback [98.7678704343537]
We focus on a class of nonsubmodular functions with special structure, and prove regret guarantees for several variants of the online and approximate online bandit gradient descent algorithms.
We derive bounds for the agent's regret in the full information and bandit feedback setting, even if the delay between choosing a decision and receiving the incurred cost is unbounded.
arXiv Detail & Related papers (2022-05-15T08:27:12Z) - Continuous Submodular Function Maximization [91.17492610120324]
Continuous submodularity is a class of functions with a wide spectrum of applications.
We identify several applications of continuous submodular optimization, ranging from influence, MAP for inferences to inferences to field field.
arXiv Detail & Related papers (2020-06-24T04:37:31Z) - Batch greedy maximization of non-submodular functions: Guarantees and
applications to experimental design [0.0]
We analyze greedys for cardinality constrained of non-submodular non-decreasing set functions.
Our theoretical guarantees are characterized by the combination of submodularity and supermodularity ratios.
arXiv Detail & Related papers (2020-06-03T18:58:06Z) - Submodular Bandit Problem Under Multiple Constraints [8.100450025624443]
We introduce a submodular bandit problem under the intersection of $l$ knapsacks and a $k$-system constraint.
To solve this problem, we propose a non-greedy algorithm that adaptively focuses on a standard or modified upper-confidence bound.
We provide a high-probability upper bound of an approximation regret, where the approximation ratio matches that of a fast algorithm.
arXiv Detail & Related papers (2020-06-01T01:28:44Z) - Online DR-Submodular Maximization with Stochastic Cumulative Constraints [17.660958043781154]
We consider online continuous DR-submodular with linear long-term constraints.
Online Lagrangian Frank-Wolfe (OLFW) algorithm to solve this class of online problems.
arXiv Detail & Related papers (2020-05-29T17:55:42Z) - Streaming Submodular Maximization under a $k$-Set System Constraint [42.31117997337689]
We propose a novel framework that converts streaming for monotone submodular into streaming for non-monotone submodular.
We also propose the first algorithm for monotone submodular streaming subject to $k$ible $k$-set system constraints.
arXiv Detail & Related papers (2020-02-09T12:32:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.