Value function interference and greedy action selection in value-based
multi-objective reinforcement learning
- URL: http://arxiv.org/abs/2402.06266v1
- Date: Fri, 9 Feb 2024 09:28:01 GMT
- Title: Value function interference and greedy action selection in value-based
multi-objective reinforcement learning
- Authors: Peter Vamplew, Cameron Foale, Richard Dazeley
- Abstract summary: Multi-objective reinforcement learning (MORL) algorithms extend conventional reinforcement learning (RL)
We show that, if the user's utility function maps widely varying vector-values to similar levels of utility, this can lead to interference.
We demonstrate empirically that avoiding the use of random tie-breaking when identifying greedy actions can ameliorate, but not fully overcome, the problems caused by value function interference.
- Score: 1.4206639868377509
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-objective reinforcement learning (MORL) algorithms extend conventional
reinforcement learning (RL) to the more general case of problems with multiple,
conflicting objectives, represented by vector-valued rewards. Widely-used
scalar RL methods such as Q-learning can be modified to handle multiple
objectives by (1) learning vector-valued value functions, and (2) performing
action selection using a scalarisation or ordering operator which reflects the
user's utility with respect to the different objectives. However, as we
demonstrate here, if the user's utility function maps widely varying
vector-values to similar levels of utility, this can lead to interference in
the value-function learned by the agent, leading to convergence to sub-optimal
policies. This will be most prevalent in stochastic environments when
optimising for the Expected Scalarised Return criterion, but we present a
simple example showing that interference can also arise in deterministic
environments. We demonstrate empirically that avoiding the use of random
tie-breaking when identifying greedy actions can ameliorate, but not fully
overcome, the problems caused by value function interference.
Related papers
- Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL)
We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$.
The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Addressing the issue of stochastic environments and local
decision-making in multi-objective reinforcement learning [0.0]
Multi-objective reinforcement learning (MORL) is a relatively new field which builds on conventional Reinforcement Learning (RL)
This thesis focuses on what factors influence the frequency with which value-based MORL Q-learning algorithms learn the optimal policy for an environment.
arXiv Detail & Related papers (2022-11-16T04:56:42Z) - Meta-Wrapper: Differentiable Wrapping Operator for User Interest
Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems.
Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success.
We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z) - Object-Aware Regularization for Addressing Causal Confusion in Imitation
Learning [131.1852444489217]
This paper presents Object-aware REgularizatiOn (OREO), a technique that regularizes an imitation policy in an object-aware manner.
Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions.
arXiv Detail & Related papers (2021-10-27T01:56:23Z) - Randomized Entity-wise Factorization for Multi-Agent Reinforcement
Learning [59.62721526353915]
Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities.
Our method aims to leverage these commonalities by asking the question: What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?''
arXiv Detail & Related papers (2020-06-07T18:28:41Z) - Dynamic Value Estimation for Single-Task Multi-Scene Reinforcement
Learning [22.889059874754242]
Training deep reinforcement learning agents on environments with multiple levels / scenes / conditions from the same task, has become essential for many applications.
We propose a dynamic value estimation (DVE) technique for these multiple-MDP environments, motivated by the clustering effect observed in the value function distribution across different scenes.
arXiv Detail & Related papers (2020-05-25T17:56:08Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z) - A utility-based analysis of equilibria in multi-objective normal form
games [4.632366780742502]
We argue that compromises between competing objectives in MOMAS should be analysed on the basis of the utility that these compromises have for the users of a system.
This utility-based approach naturally leads to two different optimisation criteria for agents in a MOMAS.
We show that the choice of optimisation criterion can radically alter the set of equilibria in a MONFG when non-linear utility functions are used.
arXiv Detail & Related papers (2020-01-17T22:27:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.