Related papers: Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning

Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning

URL: http://arxiv.org/abs/2402.02665v1
Date: Mon, 5 Feb 2024 01:42:28 GMT
Title: Utility-Based Reinforcement Learning: Unifying Single-objective and Multi-objective Reinforcement Learning
Authors: Peter Vamplew, Cameron Foale, Conor F. Hayes, Patrick Mannion, Enda Howley, Richard Dazeley, Scott Johnson, Johan K\"allstr\"om, Gabriel Ramos, Roxana R\u{a}dulescu, Willem R\"opke, Diederik M. Roijers
Abstract summary: We extend the utility-based paradigm to the context of single-objective reinforcement learning (RL) We outline multiple potential benefits including the ability to perform multi-policy learning across tasks relating to uncertain objectives, risk-aware RL, discounting, and safe RL. We also examine the algorithmic implications of adopting a utility-based approach.
Score: 3.292607871053364
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Research in multi-objective reinforcement learning (MORL) has introduced the utility-based paradigm, which makes use of both environmental rewards and a function that defines the utility derived by the user from those rewards. In this paper we extend this paradigm to the context of single-objective reinforcement learning (RL), and outline multiple potential benefits including the ability to perform multi-policy learning across tasks relating to uncertain objectives, risk-aware RL, discounting, and safe RL. We also examine the algorithmic implications of adopting a utility-based approach.

Related papers

EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning [6.675088737484839]
We introduce an Ensemble Multi-Objective RL (EMORL) framework that fine-tunes multiple models with individual objectives to improve efficiency and flexibility.<n>Our method is the first to aggregate the hidden states of individual models, incorporating contextual information from multiple objectives.<n>We demonstrate the advantages of EMORL against existing baselines in experiments on the PAIR and Psych8k datasets.
arXiv Detail & Related papers (2025-05-05T11:30:46Z)
UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours. We focus on the case of linear utility functions parameterised by weight vectors w. We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z)
Value function interference and greedy action selection in value-based multi-objective reinforcement learning [1.4206639868377509]
Multi-objective reinforcement learning (MORL) algorithms extend conventional reinforcement learning (RL) We show that, if the user's utility function maps widely varying vector-values to similar levels of utility, this can lead to interference. We demonstrate empirically that avoiding the use of random tie-breaking when identifying greedy actions can ameliorate, but not fully overcome, the problems caused by value function interference.
arXiv Detail & Related papers (2024-02-09T09:28:01Z)
Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning [48.79569442193824]
We show that COMRL algorithms are essentially optimizing the same mutual information objective between the task variable $M$ and its latent representation $Z$ by implementing various approximate bounds. This work lays the information theoretic foundation for COMRL methods, leading to a better understanding of task representation learning in the context of reinforcement learning.
arXiv Detail & Related papers (2024-02-04T09:58:42Z)
Sharing Knowledge in Multi-Task Deep Reinforcement Learning [57.38874587065694]
We study the benefit of sharing representations among tasks to enable the effective use of deep neural networks in Multi-Task Reinforcement Learning. We prove this by providing theoretical guarantees that highlight the conditions for which is convenient to share representations among tasks.
arXiv Detail & Related papers (2024-01-17T19:31:21Z)
MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning [14.06682547001011]
State-of-the art methods typically focus on learning a single reward model. We propose Multi-Objective Reinforced Active Learning (MORAL), a novel method for combining diverse demonstrations of social norms. Our approach is able to interactively tune a deep RL agent towards a variety of preferences, while eliminating the need for computing multiple policies.
arXiv Detail & Related papers (2021-12-30T19:21:03Z)
Variational Empowerment as Representation Learning for Goal-Based Reinforcement Learning [114.07623388322048]
We discuss how the standard goal-conditioned RL (GCRL) is encapsulated by the objective variational empowerment. Our work lays a novel foundation from which to evaluate, analyze, and develop representation learning techniques in goal-based RL.
arXiv Detail & Related papers (2021-06-02T18:12:26Z)
Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives. Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process. We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z)
Reinforcement Learning through Active Inference [62.997667081978825]
We show how ideas from active inference can augment traditional reinforcement learning approaches. We develop and implement a novel objective for decision making, which we term the free energy of the expected future. We demonstrate that the resulting algorithm successfully exploration and exploitation, simultaneously achieving robust performance on several challenging RL benchmarks with sparse, well-shaped, and no rewards.
arXiv Detail & Related papers (2020-02-28T10:28:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.