A Comparison of Self-Play Algorithms Under a Generalized Framework
- URL: http://arxiv.org/abs/2006.04471v1
- Date: Mon, 8 Jun 2020 11:02:37 GMT
- Title: A Comparison of Self-Play Algorithms Under a Generalized Framework
- Authors: Daniel Hernandez, Kevin Denamganai, Sam Devlin, Spyridon Samothrakis,
James Alfred Walker
- Abstract summary: The notion of self-play, albeit often cited in multiagent Reinforcement Learning, has never been grounded in a formal model.
We present a formalized framework, with clearly defined assumptions, which encapsulates the meaning of self-play.
We measure how well a subset of the captured self-play methods approximate this solution when paired with the famous PPO algorithm.
- Score: 4.339542790745868
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Throughout scientific history, overarching theoretical frameworks have
allowed researchers to grow beyond personal intuitions and culturally biased
theories. They allow to verify and replicate existing findings, and to link is
connected results. The notion of self-play, albeit often cited in multiagent
Reinforcement Learning, has never been grounded in a formal model. We present a
formalized framework, with clearly defined assumptions, which encapsulates the
meaning of self-play as abstracted from various existing self-play algorithms.
This framework is framed as an approximation to a theoretical solution concept
for multiagent training. On a simple environment, we qualitatively measure how
well a subset of the captured self-play methods approximate this solution when
paired with the famous PPO algorithm. We also provide insights on interpreting
quantitative metrics of performance for self-play training. Our results
indicate that, throughout training, various self-play definitions exhibit
cyclic policy evolutions.
Related papers
- A Survey on Self-play Methods in Reinforcement Learning [30.17222344626277]
Self-play, characterized by agents' interactions with copies or past versions of itself, has recently gained prominence in reinforcement learning.
This paper first clarifies the preliminaries of self-play, including the multi-agent reinforcement learning framework and basic game theory concepts.
It provides a unified framework and classifies existing self-play algorithms within this framework.
arXiv Detail & Related papers (2024-08-02T07:47:51Z) - A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning [48.59516337905877]
Learning a good representation is a crucial challenge for Reinforcement Learning (RL) agents.
Recent work has developed theoretical insights into these algorithms.
We take a step towards bridging the gap between theory and practice by analyzing an action-conditional self-predictive objective.
arXiv Detail & Related papers (2024-06-04T07:22:12Z) - A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - Bridging State and History Representations: Understanding Self-Predictive RL [24.772140132462468]
Representations are at the core of all deep reinforcement learning (RL) methods for Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs)
We show that many of these seemingly distinct methods and frameworks for state and history abstractions are, in fact, based on a common idea of self-predictive abstraction.
We provide theoretical insights into the widely adopted objectives and optimization, such as the stop-gradient technique, in learning self-predictive representations.
arXiv Detail & Related papers (2024-01-17T00:47:43Z) - Learning a Diffusion Model Policy from Rewards via Q-Score Matching [93.0191910132874]
We present a theoretical framework linking the structure of diffusion model policies to a learned Q-function.
We propose a new policy update method from this theory, which we denote Q-score matching.
arXiv Detail & Related papers (2023-12-18T23:31:01Z) - Semi-supervised learning made simple with self-supervised clustering [65.98152950607707]
Self-supervised learning models have been shown to learn rich visual representations without requiring human annotations.
We propose a conceptually simple yet empirically powerful approach to turn clustering-based self-supervised methods into semi-supervised learners.
arXiv Detail & Related papers (2023-06-13T01:09:18Z) - Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups.
We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective.
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z) - Metrics and continuity in reinforcement learning [34.10996560464196]
We introduce a unified formalism for defining topologies through the lens of metrics.
We establish a hierarchy amongst these metrics and demonstrate their theoretical implications on the Markov Decision Process.
We complement our theoretical results with empirical evaluations showcasing the differences between the metrics considered.
arXiv Detail & Related papers (2021-02-02T14:30:41Z) - Instance-Based Learning of Span Representations: A Case Study through
Named Entity Recognition [48.06319154279427]
We present a method of instance-based learning that learns similarities between spans.
Our method enables to build models that have high interpretability without sacrificing performance.
arXiv Detail & Related papers (2020-04-29T23:32:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.