ReMIX: Regret Minimization for Monotonic Value Function Factorization in
Multiagent Reinforcement Learning
- URL: http://arxiv.org/abs/2302.05593v1
- Date: Sat, 11 Feb 2023 03:52:51 GMT
- Title: ReMIX: Regret Minimization for Monotonic Value Function Factorization in
Multiagent Reinforcement Learning
- Authors: Yongsheng Mei, Hanhan Zhou, Tian Lan
- Abstract summary: We study the optimal projection of an unrestricted mixing function onto monotonic function classes.
We use the Lagrangian multiplier method to obtain the close-form optimal projection weights.
Our results on Predator-Prey and StarCraft Multiagent Challenge environments demonstrate the effectiveness of our method.
- Score: 10.741140541225604
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Value function factorization methods have become a dominant approach for
cooperative multiagent reinforcement learning under a centralized training and
decentralized execution paradigm. By factorizing the optimal joint action-value
function using a monotonic mixing function of agents' utilities, these
algorithms ensure the consistency between joint and local action selections for
decentralized decision-making. Nevertheless, the use of monotonic mixing
functions also induces representational limitations. Finding the optimal
projection of an unrestricted mixing function onto monotonic function classes
is still an open problem. To this end, we propose ReMIX, formulating this
optimal projection problem for value function factorization as a regret
minimization over the projection weights of different state-action values. Such
an optimization problem can be relaxed and solved using the Lagrangian
multiplier method to obtain the close-form optimal projection weights. By
minimizing the resulting policy regret, we can narrow the gap between the
optimal and the restricted monotonic mixing functions, thus obtaining an
improved monotonic value function factorization. Our experimental results on
Predator-Prey and StarCraft Multiagent Challenge environments demonstrate the
effectiveness of our method, indicating the better capabilities of handling
environments with non-monotonic value functions.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning [17.644279061872442]
Value function factorization methods are commonly used in cooperative multi-agent reinforcement learning.
We propose the Potentially Optimal Joint Actions Weighted Qmix (POWQmix) algorithm, which recognizes the potentially optimal joint actions and assigns higher weights to the corresponding losses during training.
Experiments in matrix games, difficulty-enhanced predator-prey, and StarCraft II Multi-Agent Challenge environments demonstrate that our algorithm outperforms the state-of-the-art value-based multi-agent reinforcement learning methods.
arXiv Detail & Related papers (2024-05-13T03:27:35Z) - Moreau-Yoshida Variational Transport: A General Framework For Solving Regularized Distributional Optimization Problems [3.038642416291856]
We consider a general optimization problem of minimizing a composite objective functional defined over a class probability distributions.
We propose a novel method, dubbed as Moreau-Yoshida Variational Transport (MYVT), for solving the regularized distributional optimization problem.
arXiv Detail & Related papers (2023-07-31T01:14:42Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - No-Regret Constrained Bayesian Optimization of Noisy and Expensive
Hybrid Models using Differentiable Quantile Function Approximations [0.0]
Constrained Upper Quantile Bound (CUQB) is a conceptually simple, deterministic approach that avoids constraint approximations.
We show that CUQB significantly outperforms traditional Bayesian optimization in both constrained and unconstrained cases.
arXiv Detail & Related papers (2023-05-05T19:57:36Z) - Covariance Matrix Adaptation Evolutionary Strategy with Worst-Case
Ranking Approximation for Min--Max Optimization and its Application to
Berthing Control Tasks [19.263468901608785]
We consider a continuous min--max optimization problem $min_x in mathbbX max_y in mathbbYf(x,y)$ whose objective function is a black-box.
We propose a novel approach to minimize the worst-case objective function $F(x) = max_y f(x,y)$ directly.
arXiv Detail & Related papers (2023-03-28T15:50:56Z) - Adaptive LASSO estimation for functional hidden dynamic geostatistical
model [69.10717733870575]
We propose a novel model selection algorithm based on a penalized maximum likelihood estimator (PMLE) for functional hiddenstatistical models (f-HD)
The algorithm is based on iterative optimisation and uses an adaptive least absolute shrinkage and selector operator (GMSOLAS) penalty function, wherein the weights are obtained by the unpenalised f-HD maximum-likelihood estimators.
arXiv Detail & Related papers (2022-08-10T19:17:45Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Information-theoretic Feature Selection via Tensor Decomposition and
Submodularity [38.05393186002834]
We introduce a low-rank tensor model of the joint PMF of all variables and indirect targeting as a way of mitigating complexity and maximizing the classification performance for a given number of features.
By indirectly aiming to predict the latent variable of the naive Bayes model instead of the original target variable, it is possible to formulate the feature selection problem as of a monotone submodular function subject to a cardinality constraint.
arXiv Detail & Related papers (2020-10-30T10:36:46Z) - A Multi-Agent Primal-Dual Strategy for Composite Optimization over
Distributed Features [52.856801164425086]
We study multi-agent sharing optimization problems with the objective function being the sum of smooth local functions plus a convex (possibly non-smooth) coupling function.
arXiv Detail & Related papers (2020-06-15T19:40:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.