Towards a more efficient computation of individual attribute and policy
contribution for post-hoc explanation of cooperative multi-agent systems
using Myerson values
- URL: http://arxiv.org/abs/2212.03041v1
- Date: Tue, 6 Dec 2022 15:15:00 GMT
- Title: Towards a more efficient computation of individual attribute and policy
contribution for post-hoc explanation of cooperative multi-agent systems
using Myerson values
- Authors: Giorgio Angelotti and Natalia D\'iaz-Rodr\'iguez
- Abstract summary: A quantitative assessment of the global importance of an agent in a team is as valuable as gold for strategists, decision-makers, and sports coaches.
We propose a method to determine a Hierarchical Knowledge Graph of agents' policies and features in a Multi-Agent System.
We test the proposed approach in a proof-of-case environment deploying both hardcoded policies and policies obtained via Deep Reinforcement Learning.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A quantitative assessment of the global importance of an agent in a team is
as valuable as gold for strategists, decision-makers, and sports coaches. Yet,
retrieving this information is not trivial since in a cooperative task it is
hard to isolate the performance of an individual from the one of the whole
team. Moreover, it is not always clear the relationship between the role of an
agent and his personal attributes. In this work we conceive an application of
the Shapley analysis for studying the contribution of both agent policies and
attributes, putting them on equal footing. Since the computational complexity
is NP-hard and scales exponentially with the number of participants in a
transferable utility coalitional game, we resort to exploiting a-priori
knowledge about the rules of the game to constrain the relations between the
participants over a graph. We hence propose a method to determine a
Hierarchical Knowledge Graph of agents' policies and features in a Multi-Agent
System. Assuming a simulator of the system is available, the graph structure
allows to exploit dynamic programming to assess the importances in a much
faster way. We test the proposed approach in a proof-of-case environment
deploying both hardcoded policies and policies obtained via Deep Reinforcement
Learning. The proposed paradigm is less computationally demanding than
trivially computing the Shapley values and provides great insight not only into
the importance of an agent in a team but also into the attributes needed to
deploy the policy at its best.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Actions Speak What You Want: Provably Sample-Efficient Reinforcement
Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks [94.07688076435818]
We study reinforcement learning for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure.
Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem.
arXiv Detail & Related papers (2023-07-26T10:24:17Z) - Learning to Incentivize Information Acquisition: Proper Scoring Rules
Meet Principal-Agent Model [64.94131130042275]
We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf.
We design a provably sample efficient algorithm that tailors the UCB algorithm to our model.
Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized.
arXiv Detail & Related papers (2023-03-15T13:40:16Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - Iterated Reasoning with Mutual Information in Cooperative and Byzantine
Decentralized Teaming [0.0]
We show that reformulating an agent's policy to be conditional on the policies of its teammates inherently maximizes Mutual Information (MI) lower-bound when optimizing under Policy Gradient (PG)
Our approach, InfoPG, outperforms baselines in learning emergent collaborative behaviors and sets the state-of-the-art in decentralized cooperative MARL tasks.
arXiv Detail & Related papers (2022-01-20T22:54:32Z) - Collective eXplainable AI: Explaining Cooperative Strategies and Agent
Contribution in Multiagent Reinforcement Learning with Shapley Values [68.8204255655161]
This study proposes a novel approach to explain cooperative strategies in multiagent RL using Shapley values.
Results could have implications for non-discriminatory decision making, ethical and responsible AI-derived decisions or policy making under fairness constraints.
arXiv Detail & Related papers (2021-10-04T10:28:57Z) - Influence-based Reinforcement Learning for Intrinsically-motivated
Agents [0.0]
We present an algorithmic framework of two reinforcement learning agents each with a different objective.
We introduce a novel function approximation approach to assess the influence $F$ of a certain policy on others.
Our method was evaluated on the suite of OpenAI gym tasks as well as cooperative and mixed scenarios.
arXiv Detail & Related papers (2021-08-28T05:36:10Z) - Stateful Strategic Regression [20.7177095411398]
We describe the Stackelberg equilibrium of the resulting game and provide novel algorithms for computing it.
Our analysis reveals several intriguing insights about the role of multiple interactions in shaping the game's outcome.
Most importantly, we show that with multiple rounds of interaction at her disposal, the principal is more effective at incentivizing the agent to accumulate effort in her desired direction.
arXiv Detail & Related papers (2021-06-07T17:46:29Z) - Simple Agent, Complex Environment: Efficient Reinforcement Learning with
Agent State [35.69801203107371]
We design a simple reinforcement learning agent that can operate in any environment.
The agent maintains only visitation counts and value estimates for each agent-state-action pair.
There is no further dependence on the number of environment states or mixing times associated with other policies or statistics of history.
arXiv Detail & Related papers (2021-02-10T04:53:12Z) - Multi-agent Policy Optimization with Approximatively Synchronous
Advantage Estimation [55.96893934962757]
In multi-agent system, polices of different agents need to be evaluated jointly.
In current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously.
In this work, we propose the approximatively synchronous advantage estimation.
arXiv Detail & Related papers (2020-12-07T07:29:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.