Related papers: Reinforcement Learning Beyond Expectation

Reinforcement Learning Beyond Expectation

URL: http://arxiv.org/abs/2104.00540v1
Date: Mon, 29 Mar 2021 20:35:25 GMT
Title: Reinforcement Learning Beyond Expectation
Authors: Bhaskar Ramasubramanian, Luyao Niu, Andrew Clark, Radha Poovendran
Abstract summary: Cumulative prospect theory (CPT) is a paradigm that has been empirically shown to model a tendency of humans to view gains and losses differently. In this paper, we consider a setting where an autonomous agent has to learn behaviors in an unknown environment. In order to endow the agent with the ability to closely mimic the behavior of human users, we optimize a CPT-based cost.
Score: 11.428014000851535
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The inputs and preferences of human users are important considerations in situations where these users interact with autonomous cyber or cyber-physical systems. In these scenarios, one is often interested in aligning behaviors of the system with the preferences of one or more human users. Cumulative prospect theory (CPT) is a paradigm that has been empirically shown to model a tendency of humans to view gains and losses differently. In this paper, we consider a setting where an autonomous agent has to learn behaviors in an unknown environment. In traditional reinforcement learning, these behaviors are learned through repeated interactions with the environment by optimizing an expected utility. In order to endow the agent with the ability to closely mimic the behavior of human users, we optimize a CPT-based cost. We introduce the notion of the CPT-value of an action taken in a state, and establish the convergence of an iterative dynamic programming-based approach to estimate this quantity. We develop two algorithms to enable agents to learn policies to optimize the CPT-vale, and evaluate these algorithms in environments where a target state has to be reached while avoiding obstacles. We demonstrate that behaviors of the agent learned using these algorithms are better aligned with that of a human user who might be placed in the same environment, and is significantly improved over a baseline that optimizes an expected utility.

Related papers

DEEPER Insight into Your User: Directed Persona Refinement for Dynamic Persona Modeling [38.18345641589625]
We propose DEEPER, a novel approach for dynamic persona modeling that enables continual persona optimization. Experiments on dynamic persona modeling involving 4800 users across 10 domains highlight the superior persona optimization capabilities of DEEPER.
arXiv Detail & Related papers (2025-02-16T11:02:37Z)
Towards Recommender Systems LLMs Playground (RecSysLLMsP): Exploring Polarization and Engagement in Simulated Social Networks [6.813586966214873]
This paper introduces a novel simulation framework leveraging Large Language Models (LLMs) to explore the impacts of different content recommendation setups on user engagement and polarization in social networks. By creating diverse AI agents with descriptive, static, and dynamic attributes, we assess their autonomous behaviour across three scenarios: Plurality, Balanced, and Similarity. Our study emphasizes the need for a careful balance in recommender system designs to enhance user satisfaction while mitigating societal polarization.
arXiv Detail & Related papers (2025-01-29T14:23:34Z)
When Online Algorithms Influence the Environment: A Dynamical Systems Analysis of the Unintended Consequences [5.4209739979186295]
We analyze the effect that online algorithms have on the environment that they are learning. We show that when the recommendation algorithm is able to learn the population preferences in the presence of this mismatch, the algorithm induces similarity in the preferences of the user population.
arXiv Detail & Related papers (2024-11-21T06:47:53Z)
Quantifying User Coherence: A Unified Framework for Cross-Domain Recommendation Analysis [69.37718774071793]
This paper introduces novel information-theoretic measures for understanding recommender systems. We evaluate 7 recommendation algorithms across 9 datasets, revealing the relationships between our measures and standard performance metrics.
arXiv Detail & Related papers (2024-10-03T13:02:07Z)
Multi-Agent Dynamic Relational Reasoning for Social Robot Navigation [50.01551945190676]
Social robot navigation can be helpful in various contexts of daily life but requires safe human-robot interactions and efficient trajectory planning. We propose a systematic relational reasoning approach with explicit inference of the underlying dynamically evolving relational structures. We demonstrate its effectiveness for multi-agent trajectory prediction and social robot navigation.
arXiv Detail & Related papers (2024-01-22T18:58:22Z)
On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics [0.7655800373514546]
Risk-aware Reinforcement Learning algorithms were shown to outperform risk-neutral counterparts in a variety of continuous-action tasks. The theoretical basis for the pessimistic objectives these algorithms employ remains unestablished. We propose Dual Actor-Critic (DAC) as a risk-aware, model-free algorithm that features two distinct actor networks.
arXiv Detail & Related papers (2023-10-30T13:28:06Z)
Context-Aware Prediction of User Engagement on Online Social Platforms [15.847199578750924]
We present data suggesting that context-aware modeling approaches may offer a holistic yet lightweight representation of user engagement on online social platforms. We analyze more than 100 million Snapchat sessions from almost 80.000 users. Features related to smartphone connectivity status, location, temporal context, and weather were found to capture non-redundant variance in user engagement.
arXiv Detail & Related papers (2023-10-23T03:36:35Z)
Interactive Hyperparameter Optimization in Multi-Objective Problems via Preference Learning [65.51668094117802]
We propose a human-centered interactive HPO approach tailored towards multi-objective machine learning (ML) Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator.
arXiv Detail & Related papers (2023-09-07T09:22:05Z)
Meta-Wrapper: Differentiable Wrapping Operator for User Interest Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems. Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success. We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z)
Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning [114.36124979578896]
We design a dynamic mechanism using offline reinforcement learning algorithms. Our algorithm is based on the pessimism principle and only requires a mild assumption on the coverage of the offline data set.
arXiv Detail & Related papers (2022-05-05T05:44:26Z)
Privacy-Preserving Reinforcement Learning Beyond Expectation [6.495883501989546]
Cyber and cyber-physical systems equipped with machine learning algorithms such as autonomous cars share environments with humans. It is important to align system (or agent) behaviors with the preferences of one or more human users. We consider the case when an agent has to learn behaviors in an unknown environment.
arXiv Detail & Related papers (2022-03-18T21:28:29Z)
Deep Interactive Bayesian Reinforcement Learning via Meta-Learning [63.96201773395921]
The optimal adaptive behaviour under uncertainty over the other agents' strategies can be computed using the Interactive Bayesian Reinforcement Learning framework. We propose to meta-learn approximate belief inference and Bayes-optimal behaviour for a given prior. We show empirically that our approach outperforms existing methods that use a model-free approach, sample from the approximate posterior, maintain memory-free models of others, or do not fully utilise the known structure of the environment.
arXiv Detail & Related papers (2021-01-11T13:25:13Z)
Value Driven Representation for Human-in-the-Loop Reinforcement Learning [33.79501890330252]
We focus on algorithmic foundation of how to help the system designer choose the set of sensors or features to define the observation space used by reinforcement learning agent. We present an algorithm, value driven representation (VDR) that can iteratively and adaptively augment the observation space of a reinforcement learning agent. We evaluate the performance of our approach on standard RL benchmarks with simulated humans and demonstrate significant improvement over prior baselines.
arXiv Detail & Related papers (2020-04-02T18:45:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.