Reinforcement Learning Beyond Expectation
- URL: http://arxiv.org/abs/2104.00540v1
- Date: Mon, 29 Mar 2021 20:35:25 GMT
- Title: Reinforcement Learning Beyond Expectation
- Authors: Bhaskar Ramasubramanian, Luyao Niu, Andrew Clark, Radha Poovendran
- Abstract summary: Cumulative prospect theory (CPT) is a paradigm that has been empirically shown to model a tendency of humans to view gains and losses differently.
In this paper, we consider a setting where an autonomous agent has to learn behaviors in an unknown environment.
In order to endow the agent with the ability to closely mimic the behavior of human users, we optimize a CPT-based cost.
- Score: 11.428014000851535
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The inputs and preferences of human users are important considerations in
situations where these users interact with autonomous cyber or cyber-physical
systems. In these scenarios, one is often interested in aligning behaviors of
the system with the preferences of one or more human users. Cumulative prospect
theory (CPT) is a paradigm that has been empirically shown to model a tendency
of humans to view gains and losses differently. In this paper, we consider a
setting where an autonomous agent has to learn behaviors in an unknown
environment. In traditional reinforcement learning, these behaviors are learned
through repeated interactions with the environment by optimizing an expected
utility. In order to endow the agent with the ability to closely mimic the
behavior of human users, we optimize a CPT-based cost. We introduce the notion
of the CPT-value of an action taken in a state, and establish the convergence
of an iterative dynamic programming-based approach to estimate this quantity.
We develop two algorithms to enable agents to learn policies to optimize the
CPT-vale, and evaluate these algorithms in environments where a target state
has to be reached while avoiding obstacles. We demonstrate that behaviors of
the agent learned using these algorithms are better aligned with that of a
human user who might be placed in the same environment, and is significantly
improved over a baseline that optimizes an expected utility.
Related papers
- PAPER-HILT: Personalized and Adaptive Privacy-Aware Early-Exit for
Reinforcement Learning in Human-in-the-Loop Systems [0.6282068591820944]
Reinforcement Learning (RL) has increasingly become a preferred method over traditional rule-based systems in diverse human-in-the-loop (HITL) applications.
This paper focuses on developing an innovative, adaptive RL strategy through exploiting an early-exit approach designed explicitly for privacy preservation in HITL environments.
arXiv Detail & Related papers (2024-03-09T10:24:12Z) - Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation [55.65482030032804]
Social robot navigation can be helpful in various contexts of daily life but requires safe human-robot interactions and efficient trajectory planning.
We propose a systematic relational reasoning approach with explicit inference of the underlying dynamically evolving relational structures.
Our approach infers dynamically evolving relation graphs and hypergraphs to capture the evolution of relations, which the trajectory predictor employs to generate future states.
arXiv Detail & Related papers (2024-01-22T18:58:22Z) - Context-Aware Prediction of User Engagement on Online Social Platforms [15.847199578750924]
We present data suggesting that context-aware modeling approaches may offer a holistic yet lightweight representation of user engagement on online social platforms.
We analyze more than 100 million Snapchat sessions from almost 80.000 users.
Features related to smartphone connectivity status, location, temporal context, and weather were found to capture non-redundant variance in user engagement.
arXiv Detail & Related papers (2023-10-23T03:36:35Z) - Interactive Hyperparameter Optimization in Multi-Objective Problems via
Preference Learning [65.51668094117802]
We propose a human-centered interactive HPO approach tailored towards multi-objective machine learning (ML)
Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator.
arXiv Detail & Related papers (2023-09-07T09:22:05Z) - Discovering How Agents Learn Using Few Data [32.38609641970052]
We propose a theoretical and algorithmic framework for real-time identification of agent behavior using a short burst of a single system trajectory.
Our approach accurately recovers the true dynamics across various benchmarks, including equilibrium selection and prediction of chaotic systems up to 10 Lynov times.
These findings suggest that our approach has significant potential to support effective policy and decision-making in strategic multi-agent systems.
arXiv Detail & Related papers (2023-07-13T09:14:48Z) - Meta-Wrapper: Differentiable Wrapping Operator for User Interest
Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems.
Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success.
We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z) - What Should I Know? Using Meta-gradient Descent for Predictive Feature
Discovery in a Single Stream of Experience [63.75363908696257]
computational reinforcement learning seeks to construct an agent's perception of the world through predictions of future sensations.
An open challenge in this line of work is determining from the infinitely many predictions that the agent could possibly make which predictions might best support decision-making.
We introduce a meta-gradient descent process by which an agent learns what predictions to make, 2) the estimates for its chosen predictions, and 3) how to use those estimates to generate policies that maximize future reward.
arXiv Detail & Related papers (2022-06-13T21:31:06Z) - Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline
Reinforcement Learning [114.36124979578896]
We design a dynamic mechanism using offline reinforcement learning algorithms.
Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set.
arXiv Detail & Related papers (2022-05-05T05:44:26Z) - Privacy-Preserving Reinforcement Learning Beyond Expectation [6.495883501989546]
Cyber and cyber-physical systems equipped with machine learning algorithms such as autonomous cars share environments with humans.
It is important to align system (or agent) behaviors with the preferences of one or more human users.
We consider the case when an agent has to learn behaviors in an unknown environment.
arXiv Detail & Related papers (2022-03-18T21:28:29Z) - Deep Interactive Bayesian Reinforcement Learning via Meta-Learning [63.96201773395921]
The optimal adaptive behaviour under uncertainty over the other agents' strategies can be computed using the Interactive Bayesian Reinforcement Learning framework.
We propose to meta-learn approximate belief inference and Bayes-optimal behaviour for a given prior.
We show empirically that our approach outperforms existing methods that use a model-free approach, sample from the approximate posterior, maintain memory-free models of others, or do not fully utilise the known structure of the environment.
arXiv Detail & Related papers (2021-01-11T13:25:13Z) - Value Driven Representation for Human-in-the-Loop Reinforcement Learning [33.79501890330252]
We focus on algorithmic foundation of how to help the system designer choose the set of sensors or features to define the observation space used by reinforcement learning agent.
We present an algorithm, value driven representation (VDR) that can iteratively and adaptively augment the observation space of a reinforcement learning agent.
We evaluate the performance of our approach on standard RL benchmarks with simulated humans and demonstrate significant improvement over prior baselines.
arXiv Detail & Related papers (2020-04-02T18:45:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.