Related papers: Effects of Spectral Normalization in Multi-agent Reinforcement Learning

Related papers

Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations [86.90549830760513]
Sparsely activated Mixture-of-Experts (SMoE) has shown promise in scaling up the learning capacity of neural networks. We propose MoE Experts Compression Suite (MC-Suite) to provide a benchmark for estimating expert importance from diverse perspectives. We present an experimentally validated conjecture that, during expert dropping, SMoEs' instruction-following capabilities are predominantly hurt.
arXiv Detail & Related papers (2025-04-08T00:49:08Z)
IL-SOAR : Imitation Learning with Soft Optimistic Actor cRitic [52.44637913176449]
This paper introduces the SOAR framework for imitation learning. It is an algorithmic template that learns a policy from expert demonstrations with a primal dual style algorithm that alternates cost and policy updates. It is shown to boost consistently the performance of imitation learning algorithms based on Soft Actor Critic such as f-IRL, ML-IRL and CSIL in several MuJoCo environments.
arXiv Detail & Related papers (2025-02-27T08:03:37Z)
On Multi-Agent Inverse Reinforcement Learning [8.284137254112848]
We extend the Inverse Reinforcement Learning (IRL) framework to the multi-agent setting, assuming to observe agents who are following Nash Equilibrium (NE) policies. We provide an explicit characterization of the feasible reward set and analyze how errors in estimating the transition dynamics and expert behavior impact the recovered rewards.
arXiv Detail & Related papers (2024-11-22T16:31:36Z)
Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations. We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games. Our findings underscore the multifaceted approach required for MARLHF, paving the way for effective preference-based multi-agent systems.
arXiv Detail & Related papers (2024-09-01T13:14:41Z)
Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent. We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents. Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z)
Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts [81.37287967870589]
We propose to harness a diverse set of specialized teachers, instead of a single generalist one, that collectively supervises the strong student. Our approach resembles the classical hierarchical mixture of experts, with two components tailored for co-supervision. We validate the proposed method through visual recognition tasks on the OpenAI weak-to-strong benchmark and additional multi-domain datasets.
arXiv Detail & Related papers (2024-02-23T18:56:11Z)
Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives [8.524720028421447]
This paper provides a systematic analysis of the opportunities, challenges, and potential solutions of harnessing Large Language Models (LLMs) such as GPT-4. generating more answers with higher randomness largely boosts the likelihood of producing a correct answer but inevitably leads to a higher number of false positives. We propose an adversarial framework dubbed GPTLens that breaks the conventional one-stage detection into two synergistic stages $-$ generation and discrimination.
arXiv Detail & Related papers (2023-10-02T12:37:23Z)
PAC-Bayesian Soft Actor-Critic Learning [9.752336113724928]
Actor-critic algorithms address the dual goals of reinforcement learning (RL), policy evaluation and improvement via two separate function approximators. We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm.
arXiv Detail & Related papers (2023-01-30T10:44:15Z)
Solving Continuous Control via Q-learning [54.05120662838286]
We show that a simple modification of deep Q-learning largely alleviates issues with actor-critic methods. By combining bang-bang action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL), this simple critic-only approach matches performance of state-of-the-art continuous actor-critic methods.
arXiv Detail & Related papers (2022-10-22T22:55:50Z)
Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot [71.28884625011987]
Melting Pot is a MARL evaluation suite that uses reinforcement learning to reduce the human labor required to create novel test scenarios. We have created over 80 unique test scenarios covering a broad range of research topics. We apply these test scenarios to standard MARL training algorithms, and demonstrate how Melting Pot reveals weaknesses not apparent from training performance alone.
arXiv Detail & Related papers (2021-07-14T17:22:14Z)
SA-MATD3:Self-attention-based multi-agent continuous control method in cooperative environments [12.959163198988536]
Existing algorithms suffer from the problem of uneven learning degree with the increase of the number of agents. A new structure for a multi-agent actor critic is proposed, and the self-attention mechanism is applied in the critic network. The proposed algorithm makes full use of the samples in the replay memory buffer to learn the behavior of a class of agents.
arXiv Detail & Related papers (2021-07-01T08:15:05Z)
Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning [11.292086312664383]
Our proposed algorithm, called Shared Experience Actor-Critic (SEAC), applies experience sharing in an actor-critic framework. We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms two baselines and two state-of-the-art algorithms.
arXiv Detail & Related papers (2020-06-12T13:24:50Z)
FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC) It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.