Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior
- URL: http://arxiv.org/abs/2006.06580v3
- Date: Sat, 27 Aug 2022 02:50:31 GMT
- Title: Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior
- Authors: Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi
- Abstract summary: We study the behaviors of online learning algorithms in the Iterated Prisoner's Dilemma (IPD) game.
We evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion.
Results suggest that considering the current situation to make decision is the worst in this kind of social dilemma game.
- Score: 27.80555922579736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As an important psychological and social experiment, the Iterated Prisoner's
Dilemma (IPD) treats the choice to cooperate or defect as an atomic action. We
propose to study the behaviors of online learning algorithms in the Iterated
Prisoner's Dilemma (IPD) game, where we investigate the full spectrum of
reinforcement learning agents: multi-armed bandits, contextual bandits and
reinforcement learning. We evaluate them based on a tournament of iterated
prisoner's dilemma where multiple agents can compete in a sequential fashion.
This allows us to analyze the dynamics of policies learned by multiple
self-interested independent reward-driven agents, and also allows us study the
capacity of these algorithms to fit the human behaviors. Results suggest that
considering the current situation to make decision is the worst in this kind of
social dilemma game. Multiples discoveries on online learning behaviors and
clinical validations are stated, as an effort to connect artificial
intelligence algorithms with human behaviors and their abnormal states in
neuropsychiatric conditions.
Related papers
- Multi-agent cooperation through learning-aware policy gradients [53.63948041506278]
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning.
We present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning.
We derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
arXiv Detail & Related papers (2024-10-24T10:48:42Z) - Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma? [0.1474723404975345]
We study the cooperative behavior of Llama2 when playing the Iterated Prisoner's Dilemma against random adversaries displaying various levels of hostility.
We find that Llama2 tends not to initiate defection but it adopts a cautious approach towards cooperation.
In comparison to prior research on human participants, Llama2 exhibits a greater inclination towards cooperative behavior.
arXiv Detail & Related papers (2024-06-19T14:51:14Z) - RLIF: Interactive Imitation Learning as Reinforcement Learning [56.997263135104504]
We show how off-policy reinforcement learning can enable improved performance under assumptions that are similar but potentially even more practical than those of interactive imitation learning.
Our proposed method uses reinforcement learning with user intervention signals themselves as rewards.
This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert.
arXiv Detail & Related papers (2023-11-21T21:05:21Z) - Neural Amortized Inference for Nested Multi-agent Reasoning [54.39127942041582]
We propose a novel approach to bridge the gap between human-like inference capabilities and computational limitations.
We evaluate our method in two challenging multi-agent interaction domains.
arXiv Detail & Related papers (2023-08-21T22:40:36Z) - Bandit Social Learning: Exploration under Myopic Behavior [58.75758600464338]
We study social learning dynamics motivated by reviews on online platforms.
Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration.
We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z) - Human-AI Coordination via Human-Regularized Search and Learning [33.95649252941375]
We develop a three-step algorithm that achieve strong performance in coordinating with real humans in the Hanabi benchmark.
We first use a regularized search algorithm and behavioral cloning to produce a better human model that captures diverse skill levels.
We show that our method beats a vanilla best response to behavioral cloning baseline by having experts play repeatedly with the two agents.
arXiv Detail & Related papers (2022-10-11T03:46:12Z) - Policy Fusion for Adaptive and Customizable Reinforcement Learning
Agents [137.86426963572214]
We show how to combine distinct behavioral policies to obtain a meaningful "fusion" policy.
We propose four different policy fusion methods for combining pre-trained policies.
We provide several practical examples and use-cases for how these methods are indeed useful for video game production and designers.
arXiv Detail & Related papers (2021-04-21T16:08:44Z) - Learning Human Rewards by Inferring Their Latent Intelligence Levels in
Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data [18.750834997334664]
We argue that humans are bounded rational and have different intelligence levels when reasoning about others' decision-making process.
We propose a new multi-agent Inverse Reinforcement Learning framework that reasons about humans' latent intelligence levels during learning.
arXiv Detail & Related papers (2021-03-07T07:48:31Z) - Predicting human decision making in psychological tasks with recurrent
neural networks [27.80555922579736]
We propose here to use a recurrent neural network architecture based on long short-term memory networks (LSTM) to predict the time series of the actions taken by human subjects engaged in gaming activity.
In this study, we collate the human data from 8 published literature of the Iterated Prisoner's Dilemma comprising 168,386 individual decisions and post-process them into 8,257 behavioral trajectories of 9 actions each for both players.
We demonstrate a clear advantage over the state-of-the-art methods in predicting human decision-making trajectories in both the single-agent scenario of the Iowa Gambling Task and the multi
arXiv Detail & Related papers (2020-10-22T03:36:03Z) - Learning from Learners: Adapting Reinforcement Learning Agents to be
Competitive in a Card Game [71.24825724518847]
We present a study on how popular reinforcement learning algorithms can be adapted to learn and to play a real-world implementation of a competitive multiplayer card game.
We propose specific training and validation routines for the learning agents, in order to evaluate how the agents learn to be competitive and explain how they adapt to each others' playing style.
arXiv Detail & Related papers (2020-04-08T14:11:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.