On Stateful Value Factorization in Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2408.15381v2
- Date: Mon, 9 Sep 2024 22:49:17 GMT
- Title: On Stateful Value Factorization in Multi-Agent Reinforcement Learning
- Authors: Enrico Marchesini, Andrea Baisero, Rupali Bhati, Christopher Amato,
- Abstract summary: We introduce Duelmix, a factorization algorithm that learns distinct per-agent utility estimators to improve performance.
Experiments on StarCraft II micromanagement and Box Pushing tasks demonstrate the benefits of our intuitions.
- Score: 19.342676562701794
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Value factorization is a popular paradigm for designing scalable multi-agent reinforcement learning algorithms. However, current factorization methods make choices without full justification that may limit their performance. For example, the theory in prior work uses stateless (i.e., history) functions, while the practical implementations use state information -- making the motivating theory a mismatch for the implementation. Also, methods have built off of previous approaches, inheriting their architectures without exploring other, potentially better ones. To address these concerns, we formally analyze the theory of using the state instead of the history in current methods -- reconnecting theory and practice. We then introduce DuelMIX, a factorization algorithm that learns distinct per-agent utility estimators to improve performance and achieve full expressiveness. Experiments on StarCraft II micromanagement and Box Pushing tasks demonstrate the benefits of our intuitions.
Related papers
- BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning [78.63421517563056]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks.
We present a unified probabilistic framework that formalizes LLM reasoning through a novel graphical model.
We introduce the Bootstrapping Reinforced Thinking Process (BRiTE) algorithm, which works in two steps.
arXiv Detail & Related papers (2025-01-31T02:39:07Z) - Assisting Mathematical Formalization with A Learning-based Premise Retriever [29.06255449960557]
We introduce an innovative method for training a premise retriever to support the formalization of mathematics.
Our approach employs a BERT model to embed proof states and premises into a shared latent space.
To streamline the formalization process, we will release a search engine that enables users to query Mathlib theorems directly using proof states.
arXiv Detail & Related papers (2025-01-21T06:32:25Z) - Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs [76.43407125275202]
o1-like models can emulate human-like long-time thinking during inference.
This paper presents the first comprehensive study on the prevalent issue of overthinking in these models.
We propose strategies to mitigate overthinking, streamlining reasoning processes without compromising accuracy.
arXiv Detail & Related papers (2024-12-30T18:55:12Z) - Model Predictive Control with Self-supervised Representation Learning [13.225264876433528]
We propose the use of a reconstruction function within the TD-MPC framework, so that the agent can reconstruct the original observation.
Our proposed addition of another loss term leads to improved performance on both state- and image-based tasks.
arXiv Detail & Related papers (2023-04-14T16:02:04Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - MACE: An Efficient Model-Agnostic Framework for Counterfactual
Explanation [132.77005365032468]
We propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE)
In our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity.
Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.
arXiv Detail & Related papers (2022-05-31T04:57:06Z) - Revisiting Model-based Value Expansion [35.55280687116388]
Model-based value expansion methods promise to improve the quality of value function targets and the effectiveness of value function learning.
However, to date, these methods are being outperformed by Dyna-style algorithms with conceptually simpler 1-step value function targets.
We provide a thorough empirical study to shed light on the causes of failure of value expansion methods in practice.
arXiv Detail & Related papers (2022-03-28T11:21:49Z) - What can I do here? A Theory of Affordances in Reinforcement Learning [65.70524105802156]
We develop a theory of affordances for agents who learn and plan in Markov Decision Processes.
Affordances play a dual role in this case, by reducing the number of actions available in any given situation.
We propose an approach to learn affordances and use it to estimate transition models that are simpler and generalize better.
arXiv Detail & Related papers (2020-06-26T16:34:53Z) - An Advance on Variable Elimination with Applications to Tensor-Based
Computation [11.358487655918676]
We present new results on the classical algorithm of variable elimination, which underlies many algorithms including for probabilistic inference.
The results relate to exploiting functional dependencies, allowing one to perform inference and learning efficiently on models that have very large treewidth.
arXiv Detail & Related papers (2020-02-21T14:17:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.