Related papers: Multi-Policy Pareto Front Tracking Based Online and Offline Multi-Objective Reinforcement Learning

Multi-Policy Pareto Front Tracking Based Online and Offline Multi-Objective Reinforcement Learning

URL: http://arxiv.org/abs/2508.02217v1
Date: Mon, 04 Aug 2025 09:09:04 GMT
Title: Multi-Policy Pareto Front Tracking Based Online and Offline Multi-Objective Reinforcement Learning
Authors: Zeyu Zhao, Yueling Che, Kaichen Liu, Jian Li, Junmei Yao,
Abstract summary: Multi-policy reinforcement learning (MORL) plays a pivotal role in addressing multi-criteria decision-making problems in the real world.<n>Traditional MP methods only rely on the online reinforcement learning (RL) and adopt the evolutionary framework with a large policy population.<n>We propose a novel MPFT framework without maintaining any policy population, where both online and offline MORL algorithms can be applied.
Score: 6.815740081890867
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-objective reinforcement learning (MORL) plays a pivotal role in addressing multi-criteria decision-making problems in the real world. The multi-policy (MP) based methods are widely used to obtain high-quality Pareto front approximation for the MORL problems. However, traditional MP methods only rely on the online reinforcement learning (RL) and adopt the evolutionary framework with a large policy population. This may lead to sample inefficiency and/or overwhelmed agent-environment interactions in practice. By forsaking the evolutionary framework, we propose the novel Multi-policy Pareto Front Tracking (MPFT) framework without maintaining any policy population, where both online and offline MORL algorithms can be applied. The proposed MPFT framework includes four stages: Stage 1 approximates all the Pareto-vertex policies, whose mapping to the objective space fall on the vertices of the Pareto front. Stage 2 designs the new Pareto tracking mechanism to track the Pareto front, starting from each of the Pareto-vertex policies. Stage 3 identifies the sparse regions in the tracked Pareto front, and introduces a new objective weight adjustment method to fill the sparse regions. Finally, by combining all the policies tracked in Stages 2 and 3, Stage 4 approximates the Pareto front. Experiments are conducted on seven different continuous-action robotic control tasks with both online and offline MORL algorithms, and demonstrate the superior hypervolume performance of our proposed MPFT approach over the state-of-the-art benchmarks, with significantly reduced agent-environment interactions and hardware requirements.

Related papers

Alignment of large language models with constrained learning [93.2264691508005]
We study the problem of computing an optimal large language model (LLM) policy for a constrained alignment problem.<n>We employ Lagrangian duality to develop an iterative dual-based alignment method that alternates between updating the policy via Lagrangian and updating a dual variable via dual descent.
arXiv Detail & Related papers (2025-05-26T01:04:56Z)
How to Find the Exact Pareto Front for Multi-Objective MDPs? [28.70863169250383]
Multi-Objective Markov Decision Processes (MO-MDPs) are receiving increasing attention, as real-world decision-making problems often involve conflicting objectives that cannot be addressed by a single-objective MDP.<n>In this work, we address the challenge of efficiently discovering the Pareto front.<n>By investigating the geometric structure of the Pareto front in MO-MDPs, we uncover a key property.<n>This insight transforms the global comparison across all policies into a localized search among deterministic policies that differ by only one state-action pair.
arXiv Detail & Related papers (2024-10-21T01:03:54Z)
C-MORL: Multi-Objective Reinforcement Learning through Efficient Discovery of Pareto Front [9.04360155372014]
Constrained MORL is a seamless bridge between constrained policy optimization and MORL.<n>Our algorithm achieves more consistent and superior performances in terms of hypervolume, expected utility, and sparsity on both discrete and continuous control tasks.
arXiv Detail & Related papers (2024-10-03T06:13:56Z)
Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.<n>We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.<n>This appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z)
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences [49.14535254003683]
We introduce PaLoRA, a novel parameter-efficient method that addresses multi-task trade-offs in machine learning.<n>Our experiments show that PaLoRA outperforms state-of-the-art MTL and PFL baselines across various datasets.
arXiv Detail & Related papers (2024-07-10T21:25:51Z)
Learning Pareto Set for Multi-Objective Continuous Robot Control [7.853788769559891]
We propose a simple and resource-efficient MORL algorithm that learns a continuous representation of the Pareto set in a high-dimensional policy parameter space. Experimental results show that our method achieves the best overall performance with the least training parameters.
arXiv Detail & Related papers (2024-06-27T06:31:51Z)
HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning [72.25707314772254]
We introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task. The upper level of this framework is dedicated to learning a task-specific mask that delineates the harmony subspace, while the inner level focuses on updating parameters to enhance the overall performance of the unified policy.
arXiv Detail & Related papers (2024-05-28T11:41:41Z)
UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [51.00436121587591]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.<n>We focus on the case of linear utility functions parametrised by weight vectors w.<n>We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z)
Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning [2.1408617023874443]
We propose a novel multi-objective reinforcement learning (MORL) algorithm that trains a single neural network via policy gradient. The proposed method works in both continuous and discrete action spaces with no design change of the policy network.
arXiv Detail & Related papers (2023-03-15T20:07:48Z)
Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models [50.33956216274694]
In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution. We propose textitPareto Manifold Learning, an ensembling method in weight space.
arXiv Detail & Related papers (2022-10-18T11:20:54Z)
Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm. We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z)
PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm [0.18416014644193063]
We propose a novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks. PD-MORL achieves up to 25% larger hypervolume for challenging continuous control tasks and uses an order of magnitude fewer trainable parameters compared to prior approaches.
arXiv Detail & Related papers (2022-08-16T19:23:02Z)
Imitation Learning from MPC for Quadrupedal Multi-Gait Control [63.617157490920505]
We present a learning algorithm for training a single policy that imitates multiple gaits of a walking robot. We use and extend MPC-Net, which is an Imitation Learning approach guided by Model Predictive Control. We validate our approach on hardware and show that a single learned policy can replace its teacher to control multiple gaits.
arXiv Detail & Related papers (2021-03-26T08:48:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.