Related papers: Online Learning for Non-monotone Submodular Maximization: From Full Information to Bandit Feedback

Online Learning for Non-monotone Submodular Maximization: From Full Information to Bandit Feedback

URL: http://arxiv.org/abs/2208.07632v1
Date: Tue, 16 Aug 2022 09:32:37 GMT
Title: Online Learning for Non-monotone Submodular Maximization: From Full Information to Bandit Feedback
Authors: Qixin Zhang, Zengde Deng, Zaiyi Chen, Kuangqi Zhou, Haoyuan Hu, Yu Yang
Abstract summary: This paper revisits the online non-monotone continuous DR-submodular problem over a down-closed convex set. We present the Meta-MFW algorithm achieving a $1/e$-regret bound of $O(sqrtT)$. Next, we extend Mono-MFW to the bandit setting and propose the Bandit-MFW algorithm which attains a $1/e$-regret bound of $O(T8/9)$.
Score: 12.914842850902456
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we revisit the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, which finds wide real-world applications in the domain of machine learning, economics, and operations research. At first, we present the Meta-MFW algorithm achieving a $1/e$-regret of $O(\sqrt{T})$ at the cost of $T^{3/2}$ stochastic gradient evaluations per round. As far as we know, Meta-MFW is the first algorithm to obtain $1/e$-regret of $O(\sqrt{T})$ for the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set. Furthermore, in sharp contrast with ODC algorithm \citep{thang2021online}, Meta-MFW relies on the simple online linear oracle without discretization, lifting, or rounding operations. Considering the practical restrictions, we then propose the Mono-MFW algorithm, which reduces the per-function stochastic gradient evaluations from $T^{3/2}$ to 1 and achieves a $1/e$-regret bound of $O(T^{4/5})$. Next, we extend Mono-MFW to the bandit setting and propose the Bandit-MFW algorithm which attains a $1/e$-regret bound of $O(T^{8/9})$. To the best of our knowledge, Mono-MFW and Bandit-MFW are the first sublinear-regret algorithms to explore the one-shot and bandit setting for online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, respectively. Finally, we conduct numerical experiments on both synthetic and real-world datasets to verify the effectiveness of our methods.

Related papers

Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization. We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution. Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z)
Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework. To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework. We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z)
Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions. In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
arXiv Detail & Related papers (2023-10-21T18:38:13Z)
Improved Projection-free Online Continuous Submodular Maximization [35.324719857218014]
We investigate the problem of online learning with monotone and continuous DR-submodular reward functions. Previous studies have proposed an efficient projection-free algorithm called Mono-Frank-Wolfe (Mono-FW) using $O(T)$ gradient evaluations. We propose an improved projection-free algorithm, namely POBGA, which reduces the regret bound to $O(T3/4)$ while keeping the same computational complexity.
arXiv Detail & Related papers (2023-05-29T02:54:31Z)
Bandit Multi-linear DR-Submodular Maximization and Its Applications on Adversarial Submodular Bandits [21.54858035450694]
We give a sublinear regret algorithm for the submodular bandit with partition matroid constraint. For the bandit sequential submodular, the existing work proves an $O(T2/3)$ regret with a suboptimal $1/2$ approximation ratio.
arXiv Detail & Related papers (2023-05-21T08:51:55Z)
Randomized Greedy Learning for Non-monotone Stochastic Submodular Maximization Under Full-bandit Feedback [98.29086113546045]
We investigate the problem of unconstrained multi-armed bandits with full-bandit feedback and rewards for submodularity. We show that RGL empirically outperforms other full-bandit variants in submodular and non-submodular settings.
arXiv Detail & Related papers (2023-02-02T18:52:14Z)
Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes. This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z)
Resolving the Approximability of Offline and Online Non-monotone DR-Submodular Maximization over General Convex Sets [13.23676270963484]
DR-submodular continuous functions have many real-worlds applications in the domains of machine learning, communication systems, research and economics. We present a time online algorithm matching the $tfrac14 (1 - m)$-the-art offline algorithm. We also show that our algorithm and Du's (2022) offline are both optimal in a strong sense.
arXiv Detail & Related papers (2022-10-12T07:04:24Z)
Communication-Efficient Decentralized Online Continuous DR-Submodular Maximization [11.889570525184801]
We present two decentralized online algorithms for the monotone continuous DR-submodular-Frank problem. The first one, One-shot Decentralized Meta-Wolfe (Mono-DMFW), achieves a $(1-1/e)$regret bound $O(T4/5)$. Next, inspired by the non-oblivious boosting function citepzhang2022, we propose the Decentralized Online Boosting Gradient Ascent (DOBGA) algorithm.
arXiv Detail & Related papers (2022-08-18T07:32:28Z)
Faster First-Order Algorithms for Monotone Strongly DR-Submodular Maximization [11.919431962501479]
Continuous-submodular functions satisfy the Diminishing Returns (DR) property, which implies that they are concave along non-negative directions. We propose a new algorithm that matches the provably optimal $1-fracce$ approximation ratio after only $lceilfracLmurce$.
arXiv Detail & Related papers (2021-11-15T18:53:14Z)
The Power of Subsampling in Submodular Maximization [51.629656762796564]
We show that this approach leads to optimal/state-of-the-art results despite being much simpler than existing methods. We empirically demonstrate the effectiveness of our algorithms on video summarization, location summarization, and movie recommendation tasks.
arXiv Detail & Related papers (2021-04-06T20:25:57Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model. We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs. To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.