Related papers: gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement Learning Approach

gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement Learning Approach

URL: http://arxiv.org/abs/2204.04988v1
Date: Mon, 11 Apr 2022 10:06:49 GMT
Title: gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement Learning Approach
Authors: Johannes Dornheim
Abstract summary: Generalized Thresholded Lexicographic Ordering (gTLO) is a novel method that aims to combine non-linear MORL with the advantages of generalized MORL. We present promising results on a standard benchmark for non-linear MORL and a real-world application from the domain of manufacturing process control.
Score: 2.0305676256390934
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In real-world decision optimization, often multiple competing objectives must be taken into account. Following classical reinforcement learning, these objectives have to be combined into a single reward function. In contrast, multi-objective reinforcement learning (MORL) methods learn from vectors of per-objective rewards instead. In the case of multi-policy MORL, sets of decision policies for various preferences regarding the conflicting objectives are optimized. This is especially important when target preferences are not known during training or when preferences change dynamically during application. While it is, in general, straightforward to extend a single-objective reinforcement learning method for MORL based on linear scalarization, solutions that are reachable by these methods are limited to convex regions of the Pareto front. Non-linear MORL methods like Thresholded Lexicographic Ordering (TLO) are designed to overcome this limitation. Generalized MORL methods utilize function approximation to generalize across objective preferences and thereby implicitly learn multiple policies in a data-efficient manner, even for complex decision problems with high-dimensional or continuous state spaces. In this work, we propose \textit{generalized Thresholded Lexicographic Ordering} (gTLO), a novel method that aims to combine non-linear MORL with the advantages of generalized MORL. We introduce a deep reinforcement learning realization of the algorithm and present promising results on a standard benchmark for non-linear MORL and a real-world application from the domain of manufacturing process control.

Related papers

Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting [48.87957020168614]
Prior works in multi-reward learning typically use linear scalarization with fixed weights, which fail to capture effective online learning.<n>We introduce two approaches to increasing objective alignment, one for online learning, the other for space exploration.
arXiv Detail & Related papers (2025-09-14T21:56:35Z)
Variance Reduced Policy Gradient Method for Multi-Objective Reinforcement Learning [0.0]
Multi-Objective Reinforcement Learning (MORL) is a generalization of traditional Reinforcement Learning (RL)<n>We consider the problem of MORL where the objectives are combined using a non-linear scalarization function.<n>Previous attempts to solve this problem rely on overly strict assumptions, losing PGMs' benefits in scalability to large state-action spaces.
arXiv Detail & Related papers (2025-08-14T12:52:57Z)
Online Decision-Focused Learning [63.83903681295497]
Decision-focused learning (DFL) is an increasingly popular paradigm for training predictive models whose outputs are used in decision-making tasks.<n>We investigate DFL in dynamic environments where the objective function does not evolve over time.<n>We establish bounds on the expected dynamic regret, both when decision space is a simplex and when it is a general bounded convex polytope.
arXiv Detail & Related papers (2025-05-19T10:40:30Z)
UC-MOA: Utility-Conditioned Multi-Objective Alignment for Distributional Pareto-Optimality [52.49062565901046]
Reinforcement Learning from Human Feedback (RLHF) has become a cornerstone for aligning large language models with human values.<n>Existing approaches struggle to capture the multi-dimensional, distributional nuances of human preferences.<n>We introduce Utility-Conditioned Multi-Objective Alignment (UC-MOA), a novel framework that overcomes these limitations.
arXiv Detail & Related papers (2025-03-10T09:52:42Z)
Pareto Set Learning for Multi-Objective Reinforcement Learning [19.720934024901542]
We propose a decomposition-based framework for Multi-Objective RL (MORL) PSL-MORL harnesses the generation capability of hypernetwork to produce the parameters of the policy network for each decomposition weight. We show that PSL-MORL significantly outperforms state-of-the-art MORL methods in the hypervolume and sparsity indicators.
arXiv Detail & Related papers (2025-01-12T10:43:05Z)
Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications. Ensuring their alignment with the diverse preferences of individual users has become a critical challenge. We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z)
C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front [9.04360155372014]
Constrained MORL is a seamless bridge between constrained policy optimization and MORL. Our algorithm achieves more consistent and superior performances in terms of hypervolume, expected utility, and sparsity on both discrete and continuous control tasks.
arXiv Detail & Related papers (2024-10-03T06:13:56Z)
UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours. We focus on the case of linear utility functions parameterised by weight vectors w. We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z)
Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z)
Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning [2.1408617023874443]
We propose a novel multi-objective reinforcement learning (MORL) algorithm that trains a single neural network via policy gradient. The proposed method works in both continuous and discrete action spaces with no design change of the policy network.
arXiv Detail & Related papers (2023-03-15T20:07:48Z)
Sample-Efficient Multi-Objective Learning via Generalized Policy Improvement Prioritization [8.836422771217084]
Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences. We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes. We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks.
arXiv Detail & Related papers (2023-01-18T20:54:40Z)
Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm. We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z)
PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm [0.18416014644193063]
We propose a novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks. PD-MORL achieves up to 25% larger hypervolume for challenging continuous control tasks and uses an order of magnitude fewer trainable parameters compared to prior approaches.
arXiv Detail & Related papers (2022-08-16T19:23:02Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL. We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z)
Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives. Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process. We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.