Provable Multi-Objective Reinforcement Learning with Generative Models
- URL: http://arxiv.org/abs/2011.10134v2
- Date: Mon, 11 Jan 2021 07:28:13 GMT
- Title: Provable Multi-Objective Reinforcement Learning with Generative Models
- Authors: Dongruo Zhou and Jiahao Chen and Quanquan Gu
- Abstract summary: We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
- Score: 98.19879408649848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-objective reinforcement learning (MORL) is an extension of ordinary,
single-objective reinforcement learning (RL) that is applicable to many
real-world tasks where multiple objectives exist without known relative costs.
We study the problem of single policy MORL, which learns an optimal policy
given the preference of objectives. Existing methods require strong assumptions
such as exact knowledge of the multi-objective Markov decision process, and are
analyzed in the limit of infinite data and time. We propose a new algorithm
called model-based envelop value iteration (EVI), which generalizes the
enveloped multi-objective $Q$-learning algorithm in Yang et al., 2019. Our
method can learn a near-optimal value function with polynomial sample
complexity and linear convergence speed. To the best of our knowledge, this is
the first finite-sample analysis of MORL algorithms.
Related papers
- Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - UCB-driven Utility Function Search for Multi-objective Reinforcement Learning [75.11267478778295]
In Multi-objective Reinforcement Learning (MORL) agents are tasked with optimising decision-making behaviours.
We focus on the case of linear utility functions parameterised by weight vectors w.
We introduce a method based on Upper Confidence Bound to efficiently search for the most promising weight vectors during different stages of the learning process.
arXiv Detail & Related papers (2024-05-01T09:34:42Z) - Sample-Efficient Multi-Objective Learning via Generalized Policy
Improvement Prioritization [8.836422771217084]
Multi-objective reinforcement learning (MORL) algorithms tackle sequential decision problems where agents may have different preferences.
We introduce a novel algorithm that uses Generalized Policy Improvement (GPI) to define principled, formally-derived prioritization schemes.
We empirically show that our method outperforms state-of-the-art MORL algorithms in challenging multi-objective tasks.
arXiv Detail & Related papers (2023-01-18T20:54:40Z) - FiLM-Ensemble: Probabilistic Deep Learning via Feature-wise Linear
Modulation [69.34011200590817]
We introduce FiLM-Ensemble, a deep, implicit ensemble method based on the concept of Feature-wise Linear Modulation.
By modulating the network activations of a single deep network with FiLM, one obtains a model ensemble with high diversity.
We show that FiLM-Ensemble outperforms other implicit ensemble methods, and it comes very close to the upper bound of an explicit ensemble of networks.
arXiv Detail & Related papers (2022-05-31T18:33:15Z) - gTLO: A Generalized and Non-linear Multi-Objective Deep Reinforcement
Learning Approach [2.0305676256390934]
Generalized Thresholded Lexicographic Ordering (gTLO) is a novel method that aims to combine non-linear MORL with the advantages of generalized MORL.
We present promising results on a standard benchmark for non-linear MORL and a real-world application from the domain of manufacturing process control.
arXiv Detail & Related papers (2022-04-11T10:06:49Z) - Pareto Set Learning for Neural Multi-objective Combinatorial
Optimization [6.091096843566857]
Multiobjective optimization (MOCO) problems can be found in many real-world applications.
We develop a learning-based approach to approximate the whole Pareto set for a given MOCO problem without further search procedure.
Our proposed method significantly outperforms some other methods on the multiobjective traveling salesman problem, multiconditioned vehicle routing problem and multi knapsack problem in terms of solution quality, speed, and model efficiency.
arXiv Detail & Related papers (2022-03-29T09:26:22Z) - MODRL/D-EL: Multiobjective Deep Reinforcement Learning with Evolutionary
Learning for Multiobjective Optimization [10.614594804236893]
This paper proposes a multiobjective deep reinforcement learning with evolutionary learning algorithm for a typical complex problem called the multiobjective vehicle routing problem with time windows.
The experimental results on MO-VRPTW instances demonstrate the superiority of the proposed algorithm over other learning-based and iterative-based approaches.
arXiv Detail & Related papers (2021-07-16T15:22:20Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Theoretical Convergence of Multi-Step Model-Agnostic Meta-Learning [63.64636047748605]
We develop a new theoretical framework to provide convergence guarantee for the general multi-step MAML algorithm.
In particular, our results suggest that an inner-stage step needs to be chosen inversely proportional to $N$ of inner-stage steps in order for $N$ MAML to have guaranteed convergence.
arXiv Detail & Related papers (2020-02-18T19:17:54Z) - Ready Policy One: World Building Through Active Learning [35.358315617358976]
We introduce Ready Policy One (RP1), a framework that views Model-Based Reinforcement Learning as an active learning problem.
RP1 achieves this by utilizing a hybrid objective function, which crucially adapts during optimization.
We rigorously evaluate our method on a variety of continuous control tasks, and demonstrate statistically significant gains over existing approaches.
arXiv Detail & Related papers (2020-02-07T09:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.