Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits
and RL
- URL: http://arxiv.org/abs/2005.04544v5
- Date: Mon, 27 Dec 2021 10:39:16 GMT
- Title: Unified Models of Human Behavioral Agents in Bandits, Contextual Bandits
and RL
- Authors: Baihan Lin, Guillermo Cecchi, Djallel Bouneffouf, Jenna Reinen, Irina
Rish
- Abstract summary: We propose a more general and flexible parametric framework for sequential decision making.
Inspired by the known reward processing abnormalities of many mental disorders, our clinically-inspired agents demonstrated interesting behavioral trajectories.
- Score: 28.38826379640553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial behavioral agents are often evaluated based on their consistent
behaviors and performance to take sequential actions in an environment to
maximize some notion of cumulative reward. However, human decision making in
real life usually involves different strategies and behavioral trajectories
that lead to the same empirical outcome. Motivated by clinical literature of a
wide range of neurological and psychiatric disorders, we propose here a more
general and flexible parametric framework for sequential decision making that
involves a two-stream reward processing mechanism. We demonstrated that this
framework is flexible and unified enough to incorporate a family of problems
spanning multi-armed bandits (MAB), contextual bandits (CB) and reinforcement
learning (RL), which decompose the sequential decision making process in
different levels. Inspired by the known reward processing abnormalities of many
mental disorders, our clinically-inspired agents demonstrated interesting
behavioral trajectories and comparable performance on simulated tasks with
particular reward distributions, a real-world dataset capturing human
decision-making in gambling tasks, and the PacMan game across different reward
stationarities in a lifelong learning setting.
Related papers
- Appraisal-Guided Proximal Policy Optimization: Modeling Psychological Disorders in Dynamic Grid World [0.0]
We develop a methodology for modeling psychological disorders using Reinforcement Learning (RL) agents.
We investigated numerous reward-shaping strategies to simulate psychological disorders and regulate the behavior of the agents.
A comparison of various configurations of the modified PPO algorithm identified variants that simulate Anxiety disorder and Obsessive-Compulsive Disorder (OCD)-like behavior in agents.
arXiv Detail & Related papers (2024-07-29T19:19:54Z) - HMM for Discovering Decision-Making Dynamics Using Reinforcement Learning Experiments [5.857093069873734]
Evidence indicates that reward processing abnormalities may serve as a behavioral marker for MDD.
Recent findings suggest the inadequacy of characterizing reward learning solely based on a single RL model.
We propose a novel RL-HMM framework for analyzing reward-based decision-making.
arXiv Detail & Related papers (2024-01-25T04:03:32Z) - DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement
Learning [84.22561239481901]
We propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents.
We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement.
arXiv Detail & Related papers (2023-12-10T06:03:57Z) - Multi-intention Inverse Q-learning for Interpretable Behavior Representation [12.135423420992334]
inverse reinforcement learning (IRL) methods have proven instrumental in reconstructing animal's intentions underlying complex behaviors.
We introduce the class of hierarchical inverse Q-learning (HIQL) algorithms.
Applying HIQL to simulated experiments and several real animal behavior datasets, our approach outperforms current benchmarks in behavior prediction.
arXiv Detail & Related papers (2023-11-23T09:27:08Z) - Discovering Individual Rewards in Collective Behavior through Inverse
Multi-Agent Reinforcement Learning [3.4437947384641032]
We introduce an off-policy inverse multi-agent reinforcement learning algorithm (IMARL)
By leveraging demonstrations, our algorithm automatically uncovers the reward function and learns an effective policy for the agents.
The proposed IMARL algorithm is a significant step towards understanding collective dynamics from the perspective of its constituents.
arXiv Detail & Related papers (2023-05-17T20:07:30Z) - Learnable Behavior Control: Breaking Atari Human World Records via
Sample-Efficient Behavior Selection [56.87650511573298]
We propose a general framework called Learnable Behavioral Control (LBC) to address the limitation.
Our agents have achieved 10077.52% mean human normalized score and surpassed 24 human world records within 1B training frames.
arXiv Detail & Related papers (2023-05-09T08:00:23Z) - Learning Complex Spatial Behaviours in ABM: An Experimental
Observational Study [0.0]
This paper explores how Reinforcement Learning can be applied to create emergent agent behaviours.
Running a series of simulations, we demonstrate that agents trained using the novel Proximal Policy optimisation algorithm behave in ways that exhibit properties of real-world intelligent adaptive behaviours.
arXiv Detail & Related papers (2022-01-04T11:56:11Z) - Autonomous Reinforcement Learning: Formalism and Benchmarking [106.25788536376007]
Real-world embodied learning, such as that performed by humans and animals, is situated in a continual, non-episodic world.
Common benchmark tasks in RL are episodic, with the environment resetting between trials to provide the agent with multiple attempts.
This discrepancy presents a major challenge when attempting to take RL algorithms developed for episodic simulated environments and run them on real-world platforms.
arXiv Detail & Related papers (2021-12-17T16:28:06Z) - Object-Aware Regularization for Addressing Causal Confusion in Imitation
Learning [131.1852444489217]
This paper presents Object-aware REgularizatiOn (OREO), a technique that regularizes an imitation policy in an object-aware manner.
Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions.
arXiv Detail & Related papers (2021-10-27T01:56:23Z) - Learning "What-if" Explanations for Sequential Decision-Making [92.8311073739295]
Building interpretable parameterizations of real-world decision-making on the basis of demonstrated behavior is essential.
We propose learning explanations of expert decisions by modeling their reward function in terms of preferences with respect to "what if" outcomes.
We highlight the effectiveness of our batch, counterfactual inverse reinforcement learning approach in recovering accurate and interpretable descriptions of behavior.
arXiv Detail & Related papers (2020-07-02T14:24:17Z) - Soft Hindsight Experience Replay [77.99182201815763]
Soft Hindsight Experience Replay (SHER) is a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL)
We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards.
arXiv Detail & Related papers (2020-02-06T03:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.