How to Find the Exact Pareto Front for Multi-Objective MDPs?
- URL: http://arxiv.org/abs/2410.15557v1
- Date: Mon, 21 Oct 2024 01:03:54 GMT
- Title: How to Find the Exact Pareto Front for Multi-Objective MDPs?
- Authors: Yining Li, Peizhong Ju, Ness B. Shroff,
- Abstract summary: Multi-objective Markov Decision Processes (MDPs) are receiving increasing attention, as real-world decision-making problems often involve conflicting objectives that cannot be addressed by a single-objective MDP.
The Pareto front identifies the set of policies that cannot be dominated, providing a foundation for finding optimal solutions that can efficiently adapt to various preferences.
- Score: 28.70863169250383
- License:
- Abstract: Multi-objective Markov Decision Processes (MDPs) are receiving increasing attention, as real-world decision-making problems often involve conflicting objectives that cannot be addressed by a single-objective MDP. The Pareto front identifies the set of policies that cannot be dominated, providing a foundation for finding optimal solutions that can efficiently adapt to various preferences. However, finding the Pareto front is a highly challenging problem. Most existing methods either (i) rely on traversing the continuous preference space, which is impractical and results in approximations that are difficult to evaluate against the true Pareto front, or (ii) focus solely on deterministic Pareto optimal policies, from which there are no known techniques to characterize the full Pareto front. Moreover, finding the structure of the Pareto front itself remains unclear even in the context of dynamic programming. This work addresses the challenge of efficiently discovering the Pareto front. By investigating the geometric structure of the Pareto front in MO-MDP, we uncover a key property: the Pareto front is on the boundary of a convex polytope whose vertices all correspond to deterministic policies, and neighboring vertices of the Pareto front differ by only one state-action pair of the deterministic policy, almost surely. This insight transforms the global comparison across all policies into a localized search among deterministic policies that differ by only one state-action pair, drastically reducing the complexity of searching for the exact Pareto front. We develop an efficient algorithm that identifies the vertices of the Pareto front by solving a single-objective MDP only once and then traversing the edges of the Pareto front, making it more efficient than existing methods. Our empirical studies demonstrate the effectiveness of our theoretical strategy in discovering the Pareto front.
Related papers
- Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.
We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.
To the best of our knowledge, this appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - Pareto Front Shape-Agnostic Pareto Set Learning in Multi-Objective Optimization [6.810571151954673]
Existing methods rely on the mapping of preference vectors in the objective space to optimal solutions in the decision space.
Our proposed method can handle any shape of the Pareto front and learn the Pareto set without requiring prior knowledge.
arXiv Detail & Related papers (2024-08-11T14:09:40Z) - Learning Pareto Set for Multi-Objective Continuous Robot Control [7.853788769559891]
We propose a simple and resource-efficient MORL algorithm that learns a continuous representation of the Pareto set in a high-dimensional policy parameter space.
Experimental results show that our method achieves the best overall performance with the least training parameters.
arXiv Detail & Related papers (2024-06-27T06:31:51Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - UMOEA/D: A Multiobjective Evolutionary Algorithm for Uniform Pareto
Objectives based on Decomposition [19.13435817442015]
Multiobjective optimization (MOO) is prevalent in numerous applications.
Previous methods commonly utilize the set of Pareto objectives (particles on the PF) to represent the entire PF.
We suggest in this paper constructing emphuniformly distributed objectives on PF, so as to alleviate the limited diversity found in previous MOO approaches.
arXiv Detail & Related papers (2024-02-14T08:09:46Z) - Divide and Conquer: Provably Unveiling the Pareto Front with
Multi-Objective Reinforcement Learning [2.5115843173830252]
We introduce IPRO, a principled algorithm that decomposes the task of finding the Pareto front into a sequence of single-objective problems.
Empirical evaluations demonstrate that IPRO matches or outperforms methods that require additional domain knowledge.
By leveraging problem-specific single-objective solvers, our approach holds promise for applications beyond multi-objective reinforcement learning.
arXiv Detail & Related papers (2024-02-11T12:35:13Z) - Pareto Manifold Learning: Tackling multiple tasks via ensembles of
single-task models [50.33956216274694]
In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution.
We propose textitPareto Manifold Learning, an ensembling method in weight space.
arXiv Detail & Related papers (2022-10-18T11:20:54Z) - Pareto Set Learning for Expensive Multi-Objective Optimization [5.419608513284392]
Expensive multi-objective optimization problems can be found in many real-world applications.
This paper develops a novel learning-based method to approximate the whole Pareto set for MOBO.
arXiv Detail & Related papers (2022-10-16T09:41:54Z) - Continuous MDP Homomorphisms and Homomorphic Policy Gradient [51.25171126424949]
We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces.
We propose an actor-critic algorithm that is able to learn the policy and the MDP homomorphism map simultaneously.
arXiv Detail & Related papers (2022-09-15T15:26:49Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Safe Exploration by Solving Early Terminated MDP [77.10563395197045]
We introduce a new approach to address safe RL problems under the framework of Early TerminatedP (ET-MDP)
We first define the ET-MDP as an unconstrained algorithm with the same optimal value function as its corresponding CMDP.
An off-policy algorithm based on context models is then proposed to solve the ET-MDP, which thereby solves the corresponding CMDP with better performance and improved learning efficiency.
arXiv Detail & Related papers (2021-07-09T04:24:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.