Effective Policy Learning for Multi-Agent Online Coordination Beyond Submodular Objectives
- URL: http://arxiv.org/abs/2509.22596v1
- Date: Fri, 26 Sep 2025 17:16:34 GMT
- Title: Effective Policy Learning for Multi-Agent Online Coordination Beyond Submodular Objectives
- Authors: Qixin Zhang, Yan Sun, Can Jin, Xikun Zhang, Yao Shu, Puning Zhao, Li Shen, Dacheng Tao,
- Abstract summary: We present two policy learning algorithms for multi-agent online coordination problem.<n>The first one, textttMA-SPL, can achieve the optimal $(fracce)$-approximation for the MA-OC problem.<n>The second online algorithm named textttMA-MPL can simultaneously maintain the same approximation ratio.
- Score: 64.16056378603875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present two effective policy learning algorithms for multi-agent online coordination(MA-OC) problem. The first one, \texttt{MA-SPL}, not only can achieve the optimal $(1-\frac{c}{e})$-approximation guarantee for the MA-OC problem with submodular objectives but also can handle the unexplored $\alpha$-weakly DR-submodular and $(\gamma,\beta)$-weakly submodular scenarios, where $c$ is the curvature of the investigated submodular functions, $\alpha$ denotes the diminishing-return(DR) ratio and the tuple $(\gamma,\beta)$ represents the submodularity ratios. Subsequently, in order to reduce the reliance on the unknown parameters $\alpha,\gamma,\beta$ inherent in the \texttt{MA-SPL} algorithm, we further introduce the second online algorithm named \texttt{MA-MPL}. This \texttt{MA-MPL} algorithm is entirely \emph{parameter-free} and simultaneously can maintain the same approximation ratio as the first \texttt{MA-SPL} algorithm. The core of our \texttt{MA-SPL} and \texttt{MA-MPL} algorithms is a novel continuous-relaxation technique termed as \emph{policy-based continuous extension}. Compared with the well-established \emph{multi-linear extension}, a notable advantage of this new \emph{policy-based continuous extension} is its ability to provide a lossless rounding scheme for any set function, thereby enabling us to tackle the challenging weakly submodular objectives. Finally, extensive simulations are conducted to validate the effectiveness of our proposed algorithms.
Related papers
- Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization.<n>We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution.<n>Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z) - On the Convergence of Single-Timescale Actor-Critic [49.19842488693726]
We analyze the global convergence of the single-timescale actor-critic (AC) algorithm for the infinite-horizon discounted Decision Processes (MDs) with finite state spaces.<n>We demonstrate that the step sizes for both the actor and critic must decay as ( O(k-Pfrac12) ) with $k$ diverging from the conventional ( O(k-Pfrac12) ) rates commonly used in (non- optimal) Markov framework optimization.
arXiv Detail & Related papers (2024-10-11T14:46:29Z) - Practical Parallel Algorithms for Non-Monotone Submodular Maximization [15.187654042027111]
Submodular has found extensive applications in various domains within the field of artificial intelligence.<n>One of the parallelizability of a submodular algorithm is its adaptive complexity, which indicates the number of rounds where a number of queries to the objective function can be executed in parallel.<n>We propose the first algorithm with both provable approximation ratio and sub adaptive complexity for the problem of non-monotone submodepsular subject to a $k$-system.
arXiv Detail & Related papers (2023-08-21T11:48:34Z) - Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes.
This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z) - Communication-Efficient Decentralized Online Continuous DR-Submodular
Maximization [11.889570525184801]
We present two decentralized online algorithms for the monotone continuous DR-submodular-Frank problem.
The first one, One-shot Decentralized Meta-Wolfe (Mono-DMFW), achieves a $(1-1/e)$regret bound $O(T4/5)$.
Next, inspired by the non-oblivious boosting function citepzhang2022, we propose the Decentralized Online Boosting Gradient Ascent (DOBGA) algorithm.
arXiv Detail & Related papers (2022-08-18T07:32:28Z) - Online Learning for Non-monotone Submodular Maximization: From Full
Information to Bandit Feedback [12.914842850902456]
This paper revisits the online non-monotone continuous DR-submodular problem over a down-closed convex set.
We present the Meta-MFW algorithm achieving a $1/e$-regret bound of $O(sqrtT)$.
Next, we extend Mono-MFW to the bandit setting and propose the Bandit-MFW algorithm which attains a $1/e$-regret bound of $O(T8/9)$.
arXiv Detail & Related papers (2022-08-16T09:32:37Z) - Faster First-Order Algorithms for Monotone Strongly DR-Submodular
Maximization [11.919431962501479]
Continuous-submodular functions satisfy the Diminishing Returns (DR) property, which implies that they are concave along non-negative directions.
We propose a new algorithm that matches the provably optimal $1-fracce$ approximation ratio after only $lceilfracLmurce$.
arXiv Detail & Related papers (2021-11-15T18:53:14Z) - Uniform-PAC Bounds for Reinforcement Learning with Linear Function
Approximation [92.3161051419884]
We study reinforcement learning with linear function approximation.
Existing algorithms only have high-probability regret and/or Probably Approximately Correct (PAC) sample complexity guarantees.
We propose a new algorithm called FLUTE, which enjoys uniform-PAC convergence to the optimal policy with high probability.
arXiv Detail & Related papers (2021-06-22T08:48:56Z) - Submodular + Concave [53.208470310734825]
It has been well established that first order optimization methods can converge to the maximal objective value of concave functions.
In this work, we initiate the determinant of the smooth functions convex body $$F(x) = G(x) +C(x)$.
This class of functions is an extension of both concave and continuous DR-submodular functions for which no guarantee is known.
arXiv Detail & Related papers (2021-06-09T01:59:55Z) - Regularized Submodular Maximization at Scale [45.914693923126826]
Submodularity is inherently related to the notions of diversity, coverage, and representativeness.
We propose methods for maximizing a regularized submodular function $f = g ell$ expressed as the difference between a determinant submodular function $g$ and a modular function $ell$.
arXiv Detail & Related papers (2020-02-10T02:37:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.