Improved Projection-free Online Continuous Submodular Maximization
- URL: http://arxiv.org/abs/2305.18442v1
- Date: Mon, 29 May 2023 02:54:31 GMT
- Title: Improved Projection-free Online Continuous Submodular Maximization
- Authors: Yucheng Liao and Yuanyu Wan and Chang Yao and Mingli Song
- Abstract summary: We investigate the problem of online learning with monotone and continuous DR-submodular reward functions.
Previous studies have proposed an efficient projection-free algorithm called Mono-Frank-Wolfe (Mono-FW) using $O(T)$ gradient evaluations.
We propose an improved projection-free algorithm, namely POBGA, which reduces the regret bound to $O(T3/4)$ while keeping the same computational complexity.
- Score: 35.324719857218014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the problem of online learning with monotone and continuous
DR-submodular reward functions, which has received great attention recently. To
efficiently handle this problem, especially in the case with complicated
decision sets, previous studies have proposed an efficient projection-free
algorithm called Mono-Frank-Wolfe (Mono-FW) using $O(T)$ gradient evaluations
and linear optimization steps in total. However, it only attains a
$(1-1/e)$-regret bound of $O(T^{4/5})$. In this paper, we propose an improved
projection-free algorithm, namely POBGA, which reduces the regret bound to
$O(T^{3/4})$ while keeping the same computational complexity as Mono-FW.
Instead of modifying Mono-FW, our key idea is to make a novel combination of a
projection-based algorithm called online boosting gradient ascent, an
infeasible projection technique, and a blocking technique. Furthermore, we
consider the decentralized setting and develop a variant of POBGA, which not
only reduces the current best regret bound of efficient projection-free
algorithms for this setting from $O(T^{4/5})$ to $O(T^{3/4})$, but also reduces
the total communication complexity from $O(T)$ to $O(\sqrt{T})$.
Related papers
- Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Settling the Sample Complexity of Online Reinforcement Learning [92.02082223856479]
We show how to achieve minimax-optimal regret without incurring any burn-in cost.
We extend our theory to unveil the influences of problem-dependent quantities like the optimal value/cost and certain variances.
arXiv Detail & Related papers (2023-07-25T15:42:11Z) - Projection-free Adaptive Regret with Membership Oracles [31.422532403048738]
Most iterative algorithms require the computation of projections onto convex sets, which can be computationally expensive.
Recent work by GK22 gave sublinear adaptive regret guarantees with projection free algorithms based on the Frank Wolfe approach.
We give projection-free algorithms that are based on a different technique, inspired by Mhammedi22, that replaces projections by set-membership computations.
arXiv Detail & Related papers (2022-11-22T23:53:06Z) - Communication-Efficient Decentralized Online Continuous DR-Submodular
Maximization [11.889570525184801]
We present two decentralized online algorithms for the monotone continuous DR-submodular-Frank problem.
The first one, One-shot Decentralized Meta-Wolfe (Mono-DMFW), achieves a $(1-1/e)$regret bound $O(T4/5)$.
Next, inspired by the non-oblivious boosting function citepzhang2022, we propose the Decentralized Online Boosting Gradient Ascent (DOBGA) algorithm.
arXiv Detail & Related papers (2022-08-18T07:32:28Z) - Online Learning for Non-monotone Submodular Maximization: From Full
Information to Bandit Feedback [12.914842850902456]
This paper revisits the online non-monotone continuous DR-submodular problem over a down-closed convex set.
We present the Meta-MFW algorithm achieving a $1/e$-regret bound of $O(sqrtT)$.
Next, we extend Mono-MFW to the bandit setting and propose the Bandit-MFW algorithm which attains a $1/e$-regret bound of $O(T8/9)$.
arXiv Detail & Related papers (2022-08-16T09:32:37Z) - Multi-block-Single-probe Variance Reduced Estimator for Coupled
Compositional Optimization [49.58290066287418]
We propose a novel method named Multi-block-probe Variance Reduced (MSVR) to alleviate the complexity of compositional problems.
Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on strongity.
arXiv Detail & Related papers (2022-07-18T12:03:26Z) - New Projection-free Algorithms for Online Convex Optimization with
Adaptive Regret Guarantees [21.30065439295409]
We present new efficient textitprojection-free algorithms for online convex optimization (OCO)
Our algorithms are based on the textitonline gradient descent algorithm with a novel and efficient approach to computing so-called textitinfeasible projections
We present algorithms which, using overall $O(T)$ calls to the separation oracle, guarantee $O(sqrtT)$ adaptive regret and $O(T3/4)$ adaptive expected regret.
arXiv Detail & Related papers (2022-02-09T20:56:16Z) - Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free
Reinforcement Learning [52.76230802067506]
A novel model-free algorithm is proposed to minimize regret in episodic reinforcement learning.
The proposed algorithm employs an em early-settled reference update rule, with the aid of two Q-learning sequences.
The design principle of our early-settled variance reduction method might be of independent interest to other RL settings.
arXiv Detail & Related papers (2021-10-09T21:13:48Z) - Reusing Combinatorial Structure: Faster Iterative Projections over
Submodular Base Polytopes [7.734726150561089]
We develop a toolkit to speed up the computation of projections using both discrete and continuous perspectives.
For the special case of cardinality based submodular polytopes, we improve the runtime of computing certain Bregman projections by a factor of $Omega(n/log(n))$.
arXiv Detail & Related papers (2021-06-22T17:29:24Z) - Revisiting Modified Greedy Algorithm for Monotone Submodular
Maximization with a Knapsack Constraint [75.85952446237599]
We show that a modified greedy algorithm can achieve an approximation factor of $0.305$.
We derive a data-dependent upper bound on the optimum.
It can also be used to significantly improve the efficiency of such algorithms as branch and bound.
arXiv Detail & Related papers (2020-08-12T15:40:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.