How to Find the Exact Pareto Front for Multi-Objective MDPs?
- URL: http://arxiv.org/abs/2410.15557v2
- Date: Mon, 10 Feb 2025 04:28:55 GMT
- Title: How to Find the Exact Pareto Front for Multi-Objective MDPs?
- Authors: Yining Li, Peizhong Ju, Ness B. Shroff,
- Abstract summary: Multi-Objective Markov Decision Processes (MO-MDPs) are receiving increasing attention, as real-world decision-making problems often involve conflicting objectives that cannot be addressed by a single-objective MDP.
In this work, we address the challenge of efficiently discovering the Pareto front.
By investigating the geometric structure of the Pareto front in MO-MDPs, we uncover a key property.
This insight transforms the global comparison across all policies into a localized search among deterministic policies that differ by only one state-action pair.
- Score: 28.70863169250383
- License:
- Abstract: Multi-Objective Markov Decision Processes (MO-MDPs) are receiving increasing attention, as real-world decision-making problems often involve conflicting objectives that cannot be addressed by a single-objective MDP. The Pareto front identifies the set of policies that cannot be dominated, providing a foundation for finding Pareto optimal solutions that can efficiently adapt to various preferences. However, finding the Pareto front is a highly challenging problem. Most existing methods either (i) rely on traversing the continuous preference space, which is impractical and results in approximations that are difficult to evaluate against the true Pareto front, or (ii) focus solely on deterministic Pareto optimal policies, from which there are no known techniques to characterize the full Pareto front. Moreover, finding the structure of the Pareto front itself remains unclear even in the context of dynamic programming, where the MDP is fully known in advance. In this work, we address the challenge of efficiently discovering the Pareto front. By investigating the geometric structure of the Pareto front in MO-MDPs, we uncover a key property: the Pareto front is on the boundary of a convex polytope whose vertices all correspond to deterministic policies, and neighboring vertices of the Pareto front differ by only one state-action pair of the deterministic policy, almost surely. This insight transforms the global comparison across all policies into a localized search among deterministic policies that differ by only one state-action pair, drastically reducing the complexity of searching for the exact Pareto front. We develop an efficient algorithm that identifies the vertices of the Pareto front by solving a single-objective MDP only once and then traversing the edges of the Pareto front, making it more efficient than existing methods.
Related papers
- Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment [74.25832963097658]
Multi-Objective Alignment (MOA) aims to align responses with multiple human preference objectives.
We find that DPO-based MOA approaches suffer from widespread preference conflicts in the data.
arXiv Detail & Related papers (2025-02-20T08:27:00Z) - Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.
We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.
To the best of our knowledge, this appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - Pareto Front Shape-Agnostic Pareto Set Learning in Multi-Objective Optimization [6.810571151954673]
Existing methods rely on the mapping of preference vectors in the objective space to optimal solutions in the decision space.
Our proposed method can handle any shape of the Pareto front and learn the Pareto set without requiring prior knowledge.
arXiv Detail & Related papers (2024-08-11T14:09:40Z) - Learning Pareto Set for Multi-Objective Continuous Robot Control [7.853788769559891]
We propose a simple and resource-efficient MORL algorithm that learns a continuous representation of the Pareto set in a high-dimensional policy parameter space.
Experimental results show that our method achieves the best overall performance with the least training parameters.
arXiv Detail & Related papers (2024-06-27T06:31:51Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - UMOEA/D: A Multiobjective Evolutionary Algorithm for Uniform Pareto
Objectives based on Decomposition [19.13435817442015]
Multiobjective optimization (MOO) is prevalent in numerous applications.
Previous methods commonly utilize the set of Pareto objectives (particles on the PF) to represent the entire PF.
We suggest in this paper constructing emphuniformly distributed objectives on PF, so as to alleviate the limited diversity found in previous MOO approaches.
arXiv Detail & Related papers (2024-02-14T08:09:46Z) - Divide and Conquer: Provably Unveiling the Pareto Front with Multi-Objective Reinforcement Learning [5.897578963773195]
We introduce Iterated Pareto Referent optimisation (IPRO)
IPRO decomposes finding the Pareto front into a sequence of constrained single-objective problems.
By leveraging problem-specific single-objective solvers, our approach holds promise for applications beyond multi-objective reinforcement learning.
arXiv Detail & Related papers (2024-02-11T12:35:13Z) - Pareto Manifold Learning: Tackling multiple tasks via ensembles of
single-task models [50.33956216274694]
In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution.
We propose textitPareto Manifold Learning, an ensembling method in weight space.
arXiv Detail & Related papers (2022-10-18T11:20:54Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Pareto Navigation Gradient Descent: a First-Order Algorithm for
Optimization in Pareto Set [17.617944390196286]
Modern machine learning applications, such as multi-task learning, require finding optimal model parameters to trade-off multiple objective functions.
We propose a first-order algorithm that approximately solves OPT-in-Pareto using only gradient information.
arXiv Detail & Related papers (2021-10-17T04:07:04Z) - Safe Exploration by Solving Early Terminated MDP [77.10563395197045]
We introduce a new approach to address safe RL problems under the framework of Early TerminatedP (ET-MDP)
We first define the ET-MDP as an unconstrained algorithm with the same optimal value function as its corresponding CMDP.
An off-policy algorithm based on context models is then proposed to solve the ET-MDP, which thereby solves the corresponding CMDP with better performance and improved learning efficiency.
arXiv Detail & Related papers (2021-07-09T04:24:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.