IMO$^3$: Interactive Multi-Objective Off-Policy Optimization
- URL: http://arxiv.org/abs/2201.09798v2
- Date: Tue, 25 Jan 2022 04:30:30 GMT
- Title: IMO$^3$: Interactive Multi-Objective Off-Policy Optimization
- Authors: Nan Wang, Hongning Wang, Maryam Karimzadehgan, Branislav Kveton, Craig
Boutilier
- Abstract summary: A system designer needs to find a policy that trades off objectives to reach a desired operating point.
We propose interactive multi-objective off-policy optimization (IMO$3$)
We show that IMO$3$ identifies a near-optimal policy with high probability.
- Score: 45.2918894257473
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most real-world optimization problems have multiple objectives. A system
designer needs to find a policy that trades off these objectives to reach a
desired operating point. This problem has been studied extensively in the
setting of known objective functions. We consider a more practical but
challenging setting of unknown objective functions. In industry, this problem
is mostly approached with online A/B testing, which is often costly and
inefficient. As an alternative, we propose interactive multi-objective
off-policy optimization (IMO$^3$). The key idea in our approach is to interact
with a system designer using policies evaluated in an off-policy fashion to
uncover which policy maximizes her unknown utility function. We theoretically
show that IMO$^3$ identifies a near-optimal policy with high probability,
depending on the amount of feedback from the designer and training data for
off-policy estimation. We demonstrate its effectiveness empirically on multiple
multi-objective optimization problems.
Related papers
- Probably Approximately Correct Federated Learning [20.85915650297227]
Federated learning (FL) is a new distributed learning paradigm with privacy, utility, and efficiency as its primary pillars.
Existing research indicates that it is unlikely to simultaneously attain infinitesimal privacy leakage, utility loss, and efficiency.
How to find an optimal trade-off solution is the key consideration when designing the FL algorithm.
arXiv Detail & Related papers (2023-04-10T15:12:34Z) - Eliciting User Preferences for Personalized Multi-Objective Decision
Making through Comparative Feedback [76.7007545844273]
We propose a multi-objective decision making framework that accommodates different user preferences over objectives.
Our model consists of a Markov decision process with a vector-valued reward function, with each user having an unknown preference vector.
We suggest an algorithm that finds a nearly optimal policy for the user using a small number of comparison queries.
arXiv Detail & Related papers (2023-02-07T23:58:19Z) - Uncertainty-Aware Search Framework for Multi-Objective Bayesian
Optimization [40.40632890861706]
We consider the problem of multi-objective (MO) blackbox optimization using expensive function evaluations.
We propose a novel uncertainty-aware search framework referred to as USeMO to efficiently select the sequence of inputs for evaluation.
arXiv Detail & Related papers (2022-04-12T16:50:48Z) - $\{\text{PF}\}^2\text{ES}$: Parallel Feasible Pareto Frontier Entropy
Search for Multi-Objective Bayesian Optimization Under Unknown Constraints [4.672142224503371]
We present a novel information-theoretic acquisition function for multi-objective Bayesian optimization.
$textPF2$ES provides a low cost and accurate estimate of the mutual information for the parallel setting.
We benchmark $textPF2$ES across synthetic and real-life problems.
arXiv Detail & Related papers (2022-04-11T21:06:23Z) - Many Objective Bayesian Optimization [0.0]
Multi-objective Bayesian optimization (MOBO) is a set of methods that has been successfully applied for the simultaneous optimization of black-boxes.
In particular, MOBO methods have problems when the number of objectives in a multi-objective optimization problem are 3 or more, which is the many objective setting.
We show empirical evidence in a set of toy, synthetic, benchmark and real experiments that GPs predictive distributions of the effectiveness of the metric and the algorithm.
arXiv Detail & Related papers (2021-07-08T21:57:07Z) - Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function.
The goal is to find a policy that matches the expert's performance on some predefined set of cost functions.
We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z) - Multi-Fidelity Multi-Objective Bayesian Optimization: An Output Space
Entropy Search Approach [44.25245545568633]
We study the novel problem of blackbox optimization of multiple objectives via multi-fidelity function evaluations.
Our experiments on several synthetic and real-world benchmark problems show that MF-OSEMO, with both approximations, significantly improves over the state-of-the-art single-fidelity algorithms.
arXiv Detail & Related papers (2020-11-02T06:59:04Z) - Optimizing Interactive Systems via Data-Driven Objectives [70.3578528542663]
We propose an approach that infers the objective directly from observed user interactions.
These inferences can be made regardless of prior knowledge and across different types of user behavior.
We introduce Interactive System (ISO), a novel algorithm that uses these inferred objectives for optimization.
arXiv Detail & Related papers (2020-06-19T20:49:14Z) - Provably Efficient Exploration in Policy Optimization [117.09887790160406]
This paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO)
OPPO achieves $tildeO(sqrtd2 H3 T )$ regret.
To the best of our knowledge, OPPO is the first provably efficient policy optimization algorithm that explores.
arXiv Detail & Related papers (2019-12-12T08:40:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.