Reinforcement Learning Beyond Expectation
- URL: http://arxiv.org/abs/2104.00540v1
- Date: Mon, 29 Mar 2021 20:35:25 GMT
- Title: Reinforcement Learning Beyond Expectation
- Authors: Bhaskar Ramasubramanian, Luyao Niu, Andrew Clark, Radha Poovendran
- Abstract summary: Cumulative prospect theory (CPT) is a paradigm that has been empirically shown to model a tendency of humans to view gains and losses differently.
In this paper, we consider a setting where an autonomous agent has to learn behaviors in an unknown environment.
In order to endow the agent with the ability to closely mimic the behavior of human users, we optimize a CPT-based cost.
- Score: 11.428014000851535
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The inputs and preferences of human users are important considerations in
situations where these users interact with autonomous cyber or cyber-physical
systems. In these scenarios, one is often interested in aligning behaviors of
the system with the preferences of one or more human users. Cumulative prospect
theory (CPT) is a paradigm that has been empirically shown to model a tendency
of humans to view gains and losses differently. In this paper, we consider a
setting where an autonomous agent has to learn behaviors in an unknown
environment. In traditional reinforcement learning, these behaviors are learned
through repeated interactions with the environment by optimizing an expected
utility. In order to endow the agent with the ability to closely mimic the
behavior of human users, we optimize a CPT-based cost. We introduce the notion
of the CPT-value of an action taken in a state, and establish the convergence
of an iterative dynamic programming-based approach to estimate this quantity.
We develop two algorithms to enable agents to learn policies to optimize the
CPT-vale, and evaluate these algorithms in environments where a target state
has to be reached while avoiding obstacles. We demonstrate that behaviors of
the agent learned using these algorithms are better aligned with that of a
human user who might be placed in the same environment, and is significantly
improved over a baseline that optimizes an expected utility.
Related papers
- Quantifying User Coherence: A Unified Framework for Cross-Domain Recommendation Analysis [69.37718774071793]
This paper introduces novel information-theoretic measures for understanding recommender systems.
We evaluate 7 recommendation algorithms across 9 datasets, revealing the relationships between our measures and standard performance metrics.
arXiv Detail & Related papers (2024-10-03T13:02:07Z) - PAPER-HILT: Personalized and Adaptive Privacy-Aware Early-Exit for
Reinforcement Learning in Human-in-the-Loop Systems [0.6282068591820944]
Reinforcement Learning (RL) has increasingly become a preferred method over traditional rule-based systems in diverse human-in-the-loop (HITL) applications.
This paper focuses on developing an innovative, adaptive RL strategy through exploiting an early-exit approach designed explicitly for privacy preservation in HITL environments.
arXiv Detail & Related papers (2024-03-09T10:24:12Z) - Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation [55.65482030032804]
Social robot navigation can be helpful in various contexts of daily life but requires safe human-robot interactions and efficient trajectory planning.
We propose a systematic relational reasoning approach with explicit inference of the underlying dynamically evolving relational structures.
Our approach infers dynamically evolving relation graphs and hypergraphs to capture the evolution of relations, which the trajectory predictor employs to generate future states.
arXiv Detail & Related papers (2024-01-22T18:58:22Z) - On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics [0.7655800373514546]
Risk-aware Reinforcement Learning algorithms were shown to outperform risk-neutral counterparts in a variety of continuous-action tasks.
The theoretical basis for the pessimistic objectives these algorithms employ remains unestablished.
We propose Dual Actor-Critic (DAC) as a risk-aware, model-free algorithm that features two distinct actor networks.
arXiv Detail & Related papers (2023-10-30T13:28:06Z) - Context-Aware Prediction of User Engagement on Online Social Platforms [15.847199578750924]
We present data suggesting that context-aware modeling approaches may offer a holistic yet lightweight representation of user engagement on online social platforms.
We analyze more than 100 million Snapchat sessions from almost 80.000 users.
Features related to smartphone connectivity status, location, temporal context, and weather were found to capture non-redundant variance in user engagement.
arXiv Detail & Related papers (2023-10-23T03:36:35Z) - Interactive Hyperparameter Optimization in Multi-Objective Problems via
Preference Learning [65.51668094117802]
We propose a human-centered interactive HPO approach tailored towards multi-objective machine learning (ML)
Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator.
arXiv Detail & Related papers (2023-09-07T09:22:05Z) - Meta-Wrapper: Differentiable Wrapping Operator for User Interest
Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems.
Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success.
We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z) - Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline
Reinforcement Learning [114.36124979578896]
We design a dynamic mechanism using offline reinforcement learning algorithms.
Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set.
arXiv Detail & Related papers (2022-05-05T05:44:26Z) - Privacy-Preserving Reinforcement Learning Beyond Expectation [6.495883501989546]
Cyber and cyber-physical systems equipped with machine learning algorithms such as autonomous cars share environments with humans.
It is important to align system (or agent) behaviors with the preferences of one or more human users.
We consider the case when an agent has to learn behaviors in an unknown environment.
arXiv Detail & Related papers (2022-03-18T21:28:29Z) - Deep Interactive Bayesian Reinforcement Learning via Meta-Learning [63.96201773395921]
The optimal adaptive behaviour under uncertainty over the other agents' strategies can be computed using the Interactive Bayesian Reinforcement Learning framework.
We propose to meta-learn approximate belief inference and Bayes-optimal behaviour for a given prior.
We show empirically that our approach outperforms existing methods that use a model-free approach, sample from the approximate posterior, maintain memory-free models of others, or do not fully utilise the known structure of the environment.
arXiv Detail & Related papers (2021-01-11T13:25:13Z) - Value Driven Representation for Human-in-the-Loop Reinforcement Learning [33.79501890330252]
We focus on algorithmic foundation of how to help the system designer choose the set of sensors or features to define the observation space used by reinforcement learning agent.
We present an algorithm, value driven representation (VDR) that can iteratively and adaptively augment the observation space of a reinforcement learning agent.
We evaluate the performance of our approach on standard RL benchmarks with simulated humans and demonstrate significant improvement over prior baselines.
arXiv Detail & Related papers (2020-04-02T18:45:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.