Related papers: UCB-driven Utility Function Search for Multi-objective Reinforcement Learning

UCB-driven Utility Function Search for Multi-objective Reinforcement Learning

URL: http://arxiv.org/abs/2405.00410v2
Date: Thu, 16 May 2024 14:11:46 GMT
Title: UCB-driven Utility Function Search for Multi-objective Reinforcement Learning
Authors: Yucheng Shi, Alexandros Agapitos, David Lynch, Giorgio Cruciata, Cengis Hasan, Hao Wang, Yayu Yao, Aleksandar Milenovic,
Abstract summary: In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours. We focus on the case of linear utility functions parameterised by weight vectors w. We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
Score: 75.11267478778295
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours that trade-off between multiple, possibly conflicting, objectives. MORL based on decomposition is a family of solution methods that employ a number of utility functions to decompose the multi-objective problem into individual single-objective problems solved simultaneously in order to approximate a Pareto front of policies. We focus on the case of linear utility functions parameterised by weight vectors w. We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process, with the aim of maximising the hypervolume of the resulting Pareto front. The proposed method is shown to outperform various MORL baselines on Mujoco benchmark problems across different random seeds. The code is online at: https://github.com/SYCAMORE-1/ucb-MOPPO.

Related papers

Pareto Set Learning for Multi-Objective Reinforcement Learning [19.720934024901542]
We propose a decomposition-based framework for Multi-Objective RL (MORL) PSL-MORL harnesses the generation capability of hypernetwork to produce the parameters of the policy network for each decomposition weight. We show that PSL-MORL significantly outperforms state-of-the-art MORL methods in the hypervolume and sparsity indicators.
arXiv Detail & Related papers (2025-01-12T10:43:05Z)
Efficient Pareto Manifold Learning with Low-Rank Structure [31.082432589391953]
Multi-task learning is inherently a multi-objective optimization problem. We propose a novel approach that integrates a main network with several low-rank matrices. It significantly reduces the number of parameters and facilitates the extraction of shared features.
arXiv Detail & Related papers (2024-07-30T11:09:27Z)
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost. We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion. By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z)
PMGDA: A Preference-based Multiple Gradient Descent Algorithm [12.600588000788214]
It is desirable in many multi-objective machine learning applications, such as multi-task learning, to find a solution that fits a given preference of a decision maker. This paper proposes a novel predict-and-correct framework for locating a solution that fits the preference of a decision maker.
arXiv Detail & Related papers (2024-02-14T11:27:31Z)
Optimizing Solution-Samplers for Combinatorial Problems: The Landscape of Policy-Gradient Methods [52.0617030129699]
We introduce a novel theoretical framework for analyzing the effectiveness of DeepMatching Networks and Reinforcement Learning methods. Our main contribution holds for a broad class of problems including Max-and Min-Cut, Max-$k$-Bipartite-Bi, Maximum-Weight-Bipartite-Bi, and Traveling Salesman Problem. As a byproduct of our analysis we introduce a novel regularization process over vanilla descent and provide theoretical and experimental evidence that it helps address vanishing-gradient issues and escape bad stationary points.
arXiv Detail & Related papers (2023-10-08T23:39:38Z)
A Scale-Independent Multi-Objective Reinforcement Learning with Convergence Analysis [0.6091702876917281]
Many sequential decision-making problems need optimization of different objectives which possibly conflict with each other. We develop a single-agent scale-independent multi-objective reinforcement learning on the basis of the Advantage Actor-Critic (A2C) algorithm. A convergence analysis is then done for the devised multi-objective algorithm providing a convergence-in-mean guarantee.
arXiv Detail & Related papers (2023-02-08T16:38:55Z)
Multi-Objective GFlowNets [59.16787189214784]
We study the problem of generating diverse candidates in the context of Multi-Objective Optimization. In many applications of machine learning such as drug discovery and material design, the goal is to generate candidates which simultaneously optimize a set of potentially conflicting objectives. We propose Multi-Objective GFlowNets (MOGFNs), a novel method for generating diverse optimal solutions, based on GFlowNets.
arXiv Detail & Related papers (2022-10-23T16:15:36Z)
Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Stochastic Approach [38.76462300149459]
We develop a Multi-objective Correction (MoCo) method for multi-objective gradient optimization. The unique feature of our method is that it can guarantee convergence without increasing the non fairness gradient.
arXiv Detail & Related papers (2022-10-23T05:54:26Z)
Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models [50.33956216274694]
In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution. We propose textitPareto Manifold Learning, an ensembling method in weight space.
arXiv Detail & Related papers (2022-10-18T11:20:54Z)
Pareto Set Learning for Neural Multi-objective Combinatorial Optimization [6.091096843566857]
Multiobjective optimization (MOCO) problems can be found in many real-world applications. We develop a learning-based approach to approximate the whole Pareto set for a given MOCO problem without further search procedure. Our proposed method significantly outperforms some other methods on the multiobjective traveling salesman problem, multiconditioned vehicle routing problem and multi knapsack problem in terms of solution quality, speed, and model efficiency.
arXiv Detail & Related papers (2022-03-29T09:26:22Z)
Scalable Uni-directional Pareto Optimality for Multi-Task Learning with Constraints [4.4044968357361745]
We propose a scalable MOO solver for Multi-Objective (MOO) problems, including support for optimization under constraints. An important application of this is to estimate high-dimensional runtime for neural classification tasks.
arXiv Detail & Related papers (2021-10-28T21:35:59Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives. Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process. We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.