Peer Learning: Learning Complex Policies in Groups from Scratch via Action Recommendations
- URL: http://arxiv.org/abs/2312.09950v2
- Date: Mon, 6 May 2024 09:03:54 GMT
- Title: Peer Learning: Learning Complex Policies in Groups from Scratch via Action Recommendations
- Authors: Cedric Derstroff, Mattia Cerrato, Jannis Brugger, Jan Peters, Stefan Kramer,
- Abstract summary: Peer learning is a novel high-level reinforcement learning framework for agents learning in groups.
We show that peer learning is able to outperform single agent learning and the baseline in several challenging OpenAI Gym domains.
- Score: 16.073203911932872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Peer learning is a novel high-level reinforcement learning framework for agents learning in groups. While standard reinforcement learning trains an individual agent in trial-and-error fashion, all on its own, peer learning addresses a related setting in which a group of agents, i.e., peers, learns to master a task simultaneously together from scratch. Peers are allowed to communicate only about their own states and actions recommended by others: "What would you do in my situation?". Our motivation is to study the learning behavior of these agents. We formalize the teacher selection process in the action advice setting as a multi-armed bandit problem and therefore highlight the need for exploration. Eventually, we analyze the learning behavior of the peers and observe their ability to rank the agents' performance within the study group and understand which agents give reliable advice. Further, we compare peer learning with single agent learning and a state-of-the-art action advice baseline. We show that peer learning is able to outperform single-agent learning and the baseline in several challenging discrete and continuous OpenAI Gym domains. Doing so, we also show that within such a framework complex policies from action recommendations beyond discrete action spaces can evolve.
Related papers
- Multi-agent cooperation through learning-aware policy gradients [53.63948041506278]
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning.
We present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning.
We derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
arXiv Detail & Related papers (2024-10-24T10:48:42Z) - Reciprocal Reward Influence Encourages Cooperation From Self-Interested Agents [2.1301560294088318]
Cooperation between self-interested individuals is a widespread phenomenon in the natural world, but remains elusive in interactions between artificially intelligent agents.
We introduce Reciprocators, reinforcement learning agents which are intrinsically motivated to reciprocate the influence of opponents' actions on their returns.
We show that Reciprocators can be used to promote cooperation in temporally extended social dilemmas during simultaneous learning.
arXiv Detail & Related papers (2024-06-03T06:07:27Z) - Fast Peer Adaptation with Context-aware Exploration [63.08444527039578]
We propose a peer identification reward for learning agents in multi-agent games.
This reward motivates the agent to learn a context-aware policy for effective exploration and fast adaptation.
We evaluate our method on diverse testbeds that involve competitive (Kuhn Poker), cooperative (PO-Overcooked), or mixed (Predator-Prey-W) games with peer agents.
arXiv Detail & Related papers (2024-02-04T13:02:27Z) - Learning to Learn Group Alignment: A Self-Tuning Credo Framework with
Multiagent Teams [1.370633147306388]
Mixed incentives among a population with multiagent teams has been shown to have advantages over a fully cooperative system.
We propose a framework where individual learning agents self-regulate their configuration of incentives through various parts of their reward function.
arXiv Detail & Related papers (2023-04-14T18:16:19Z) - MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes.
This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people.
We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z) - ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward [29.737986509769808]
We propose a self-supervised intrinsic reward ELIGN - expectation alignment.
Similar to how animals collaborate in a decentralized manner with those in their vicinity, agents trained with expectation alignment learn behaviors that match their neighbors' expectations.
We show that agent coordination improves through expectation alignment because agents learn to divide tasks amongst themselves, break coordination symmetries, and confuse adversaries.
arXiv Detail & Related papers (2022-10-09T22:24:44Z) - Teachable Reinforcement Learning via Advice Distillation [161.43457947665073]
We propose a new supervision paradigm for interactive learning based on "teachable" decision-making systems that learn from structured advice provided by an external teacher.
We show that agents that learn from advice can acquire new skills with significantly less human supervision than standard reinforcement learning algorithms.
arXiv Detail & Related papers (2022-03-19T03:22:57Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Learning from Learners: Adapting Reinforcement Learning Agents to be
Competitive in a Card Game [71.24825724518847]
We present a study on how popular reinforcement learning algorithms can be adapted to learn and to play a real-world implementation of a competitive multiplayer card game.
We propose specific training and validation routines for the learning agents, in order to evaluate how the agents learn to be competitive and explain how they adapt to each others' playing style.
arXiv Detail & Related papers (2020-04-08T14:11:05Z) - Parallel Knowledge Transfer in Multi-Agent Reinforcement Learning [0.2538209532048867]
This paper proposes a novel knowledge transfer framework in MARL, PAT (Parallel Attentional Transfer)
We design two acting modes in PAT, student mode and self-learning mode.
When agents are unfamiliar with the environment, the shared attention mechanism in student mode effectively selects learning knowledge from other agents to decide agents' actions.
arXiv Detail & Related papers (2020-03-29T17:42:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.