Online Learning for Non-monotone Submodular Maximization: From Full
Information to Bandit Feedback
- URL: http://arxiv.org/abs/2208.07632v1
- Date: Tue, 16 Aug 2022 09:32:37 GMT
- Title: Online Learning for Non-monotone Submodular Maximization: From Full
Information to Bandit Feedback
- Authors: Qixin Zhang, Zengde Deng, Zaiyi Chen, Kuangqi Zhou, Haoyuan Hu, Yu
Yang
- Abstract summary: This paper revisits the online non-monotone continuous DR-submodular problem over a down-closed convex set.
We present the Meta-MFW algorithm achieving a $1/e$-regret bound of $O(sqrtT)$.
Next, we extend Mono-MFW to the bandit setting and propose the Bandit-MFW algorithm which attains a $1/e$-regret bound of $O(T8/9)$.
- Score: 12.914842850902456
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we revisit the online non-monotone continuous DR-submodular
maximization problem over a down-closed convex set, which finds wide real-world
applications in the domain of machine learning, economics, and operations
research. At first, we present the Meta-MFW algorithm achieving a $1/e$-regret
of $O(\sqrt{T})$ at the cost of $T^{3/2}$ stochastic gradient evaluations per
round. As far as we know, Meta-MFW is the first algorithm to obtain
$1/e$-regret of $O(\sqrt{T})$ for the online non-monotone continuous
DR-submodular maximization problem over a down-closed convex set. Furthermore,
in sharp contrast with ODC algorithm \citep{thang2021online}, Meta-MFW relies
on the simple online linear oracle without discretization, lifting, or rounding
operations. Considering the practical restrictions, we then propose the
Mono-MFW algorithm, which reduces the per-function stochastic gradient
evaluations from $T^{3/2}$ to 1 and achieves a $1/e$-regret bound of
$O(T^{4/5})$. Next, we extend Mono-MFW to the bandit setting and propose the
Bandit-MFW algorithm which attains a $1/e$-regret bound of $O(T^{8/9})$. To the
best of our knowledge, Mono-MFW and Bandit-MFW are the first sublinear-regret
algorithms to explore the one-shot and bandit setting for online non-monotone
continuous DR-submodular maximization problem over a down-closed convex set,
respectively. Finally, we conduct numerical experiments on both synthetic and
real-world datasets to verify the effectiveness of our methods.
Related papers
- Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems [61.85150061213987]
We study the generalized low-rank matrix bandit problem, proposed in citelu2021low under the Generalized Linear Model (GLM) framework.
To overcome the computational infeasibility and theoretical restrain of existing algorithms, we first propose the G-ESTT framework.
We show that G-ESTT can achieve the $tildeO(sqrt(d_1+d_2)3/2Mr3/2T)$ bound of regret while G-ESTS can achineve the $tildeO
arXiv Detail & Related papers (2024-01-14T14:14:19Z) - Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions.
In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
arXiv Detail & Related papers (2023-10-21T18:38:13Z) - Improved Projection-free Online Continuous Submodular Maximization [35.324719857218014]
We investigate the problem of online learning with monotone and continuous DR-submodular reward functions.
Previous studies have proposed an efficient projection-free algorithm called Mono-Frank-Wolfe (Mono-FW) using $O(T)$ gradient evaluations.
We propose an improved projection-free algorithm, namely POBGA, which reduces the regret bound to $O(T3/4)$ while keeping the same computational complexity.
arXiv Detail & Related papers (2023-05-29T02:54:31Z) - Bandit Multi-linear DR-Submodular Maximization and Its Applications on
Adversarial Submodular Bandits [21.54858035450694]
We give a sublinear regret algorithm for the submodular bandit with partition matroid constraint.
For the bandit sequential submodular, the existing work proves an $O(T2/3)$ regret with a suboptimal $1/2$ approximation ratio.
arXiv Detail & Related papers (2023-05-21T08:51:55Z) - Randomized Greedy Learning for Non-monotone Stochastic Submodular
Maximization Under Full-bandit Feedback [98.29086113546045]
We investigate the problem of unconstrained multi-armed bandits with full-bandit feedback and rewards for submodularity.
We show that RGL empirically outperforms other full-bandit variants in submodular and non-submodular settings.
arXiv Detail & Related papers (2023-02-02T18:52:14Z) - Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes.
This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z) - Resolving the Approximability of Offline and Online Non-monotone
DR-Submodular Maximization over General Convex Sets [13.23676270963484]
DR-submodular continuous functions have many real-worlds applications in the domains of machine learning, communication systems, research and economics.
We present a time online algorithm matching the $tfrac14 (1 - m)$-the-art offline algorithm.
We also show that our algorithm and Du's (2022) offline are both optimal in a strong sense.
arXiv Detail & Related papers (2022-10-12T07:04:24Z) - Communication-Efficient Decentralized Online Continuous DR-Submodular
Maximization [11.889570525184801]
We present two decentralized online algorithms for the monotone continuous DR-submodular-Frank problem.
The first one, One-shot Decentralized Meta-Wolfe (Mono-DMFW), achieves a $(1-1/e)$regret bound $O(T4/5)$.
Next, inspired by the non-oblivious boosting function citepzhang2022, we propose the Decentralized Online Boosting Gradient Ascent (DOBGA) algorithm.
arXiv Detail & Related papers (2022-08-18T07:32:28Z) - Faster First-Order Algorithms for Monotone Strongly DR-Submodular
Maximization [11.919431962501479]
Continuous-submodular functions satisfy the Diminishing Returns (DR) property, which implies that they are concave along non-negative directions.
We propose a new algorithm that matches the provably optimal $1-fracce$ approximation ratio after only $lceilfracLmurce$.
arXiv Detail & Related papers (2021-11-15T18:53:14Z) - The Power of Subsampling in Submodular Maximization [51.629656762796564]
We show that this approach leads to optimal/state-of-the-art results despite being much simpler than existing methods.
We empirically demonstrate the effectiveness of our algorithms on video summarization, location summarization, and movie recommendation tasks.
arXiv Detail & Related papers (2021-04-06T20:25:57Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov
Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model.
We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs.
To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.