Reinforcement Learning Beyond Expectation
- URL: http://arxiv.org/abs/2104.00540v1
- Date: Mon, 29 Mar 2021 20:35:25 GMT
- Title: Reinforcement Learning Beyond Expectation
- Authors: Bhaskar Ramasubramanian, Luyao Niu, Andrew Clark, Radha Poovendran
- Abstract summary: Cumulative prospect theory (CPT) is a paradigm that has been empirically shown to model a tendency of humans to view gains and losses differently.
In this paper, we consider a setting where an autonomous agent has to learn behaviors in an unknown environment.
In order to endow the agent with the ability to closely mimic the behavior of human users, we optimize a CPT-based cost.
- Score: 11.428014000851535
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The inputs and preferences of human users are important considerations in
situations where these users interact with autonomous cyber or cyber-physical
systems. In these scenarios, one is often interested in aligning behaviors of
the system with the preferences of one or more human users. Cumulative prospect
theory (CPT) is a paradigm that has been empirically shown to model a tendency
of humans to view gains and losses differently. In this paper, we consider a
setting where an autonomous agent has to learn behaviors in an unknown
environment. In traditional reinforcement learning, these behaviors are learned
through repeated interactions with the environment by optimizing an expected
utility. In order to endow the agent with the ability to closely mimic the
behavior of human users, we optimize a CPT-based cost. We introduce the notion
of the CPT-value of an action taken in a state, and establish the convergence
of an iterative dynamic programming-based approach to estimate this quantity.
We develop two algorithms to enable agents to learn policies to optimize the
CPT-vale, and evaluate these algorithms in environments where a target state
has to be reached while avoiding obstacles. We demonstrate that behaviors of
the agent learned using these algorithms are better aligned with that of a
human user who might be placed in the same environment, and is significantly
improved over a baseline that optimizes an expected utility.
Related papers
- DEEPER Insight into Your User: Directed Persona Refinement for Dynamic Persona Modeling [38.18345641589625]
We propose DEEPER, a novel approach for dynamic persona modeling that enables continual persona optimization.
Experiments on dynamic persona modeling involving 4800 users across 10 domains highlight the superior persona optimization capabilities of DEEPER.
arXiv Detail & Related papers (2025-02-16T11:02:37Z) - Towards Recommender Systems LLMs Playground (RecSysLLMsP): Exploring Polarization and Engagement in Simulated Social Networks [6.813586966214873]
This paper introduces a novel simulation framework leveraging Large Language Models (LLMs) to explore the impacts of different content recommendation setups on user engagement and polarization in social networks.
By creating diverse AI agents with descriptive, static, and dynamic attributes, we assess their autonomous behaviour across three scenarios: Plurality, Balanced, and Similarity.
Our study emphasizes the need for a careful balance in recommender system designs to enhance user satisfaction while mitigating societal polarization.
arXiv Detail & Related papers (2025-01-29T14:23:34Z) - When Online Algorithms Influence the Environment: A Dynamical Systems Analysis of the Unintended Consequences [5.4209739979186295]
We analyze the effect that online algorithms have on the environment that they are learning.
We show that when the recommendation algorithm is able to learn the population preferences in the presence of this mismatch, the algorithm induces similarity in the preferences of the user population.
arXiv Detail & Related papers (2024-11-21T06:47:53Z) - Quantifying User Coherence: A Unified Framework for Cross-Domain Recommendation Analysis [69.37718774071793]
This paper introduces novel information-theoretic measures for understanding recommender systems.
We evaluate 7 recommendation algorithms across 9 datasets, revealing the relationships between our measures and standard performance metrics.
arXiv Detail & Related papers (2024-10-03T13:02:07Z) - Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation [50.01551945190676]
Social robot navigation can be helpful in various contexts of daily life but requires safe human-robot interactions and efficient trajectory planning.
We propose a systematic relational reasoning approach with explicit inference of the underlying dynamically evolving relational structures.
We demonstrate its effectiveness for multi-agent trajectory prediction and social robot navigation.
arXiv Detail & Related papers (2024-01-22T18:58:22Z) - Interactive Hyperparameter Optimization in Multi-Objective Problems via
Preference Learning [65.51668094117802]
We propose a human-centered interactive HPO approach tailored towards multi-objective machine learning (ML)
Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator.
arXiv Detail & Related papers (2023-09-07T09:22:05Z) - Meta-Wrapper: Differentiable Wrapping Operator for User Interest
Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems.
Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success.
We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z) - Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline
Reinforcement Learning [114.36124979578896]
We design a dynamic mechanism using offline reinforcement learning algorithms.
Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set.
arXiv Detail & Related papers (2022-05-05T05:44:26Z) - Privacy-Preserving Reinforcement Learning Beyond Expectation [6.495883501989546]
Cyber and cyber-physical systems equipped with machine learning algorithms such as autonomous cars share environments with humans.
It is important to align system (or agent) behaviors with the preferences of one or more human users.
We consider the case when an agent has to learn behaviors in an unknown environment.
arXiv Detail & Related papers (2022-03-18T21:28:29Z) - Deep Interactive Bayesian Reinforcement Learning via Meta-Learning [63.96201773395921]
The optimal adaptive behaviour under uncertainty over the other agents' strategies can be computed using the Interactive Bayesian Reinforcement Learning framework.
We propose to meta-learn approximate belief inference and Bayes-optimal behaviour for a given prior.
We show empirically that our approach outperforms existing methods that use a model-free approach, sample from the approximate posterior, maintain memory-free models of others, or do not fully utilise the known structure of the environment.
arXiv Detail & Related papers (2021-01-11T13:25:13Z) - Value Driven Representation for Human-in-the-Loop Reinforcement Learning [33.79501890330252]
We focus on algorithmic foundation of how to help the system designer choose the set of sensors or features to define the observation space used by reinforcement learning agent.
We present an algorithm, value driven representation (VDR) that can iteratively and adaptively augment the observation space of a reinforcement learning agent.
We evaluate the performance of our approach on standard RL benchmarks with simulated humans and demonstrate significant improvement over prior baselines.
arXiv Detail & Related papers (2020-04-02T18:45:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.