Robust Active Preference Elicitation
- URL: http://arxiv.org/abs/2003.01899v2
- Date: Wed, 8 Dec 2021 01:43:58 GMT
- Title: Robust Active Preference Elicitation
- Authors: Phebe Vayanos, Yingxiao Ye, Duncan McElfresh, John Dickerson, Eric
Rice
- Abstract summary: We study the problem of eliciting the preferences of a decision-maker through a moderate number of pairwise comparison queries.
We are motivated by applications in high stakes domains, such as when choosing a policy for allocating scarce resources.
- Score: 10.961537256186498
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of eliciting the preferences of a decision-maker through
a moderate number of pairwise comparison queries to make them a high quality
recommendation for a specific problem. We are motivated by applications in high
stakes domains, such as when choosing a policy for allocating scarce resources
to satisfy basic needs (e.g., kidneys for transplantation or housing for those
experiencing homelessness) where a consequential recommendation needs to be
made from the (partially) elicited preferences. We model uncertainty in the
preferences as being set based and} investigate two settings: a) an offline
elicitation setting, where all queries are made at once, and b) an online
elicitation setting, where queries are selected sequentially over time in an
adaptive fashion. We propose robust optimization formulations of these problems
which integrate the preference elicitation and recommendation phases with aim
to either maximize worst-case utility or minimize worst-case regret, and study
their complexity. For the offline case, where active preference elicitation
takes the form of a two and half stage robust optimization problem with
decision-dependent information discovery, we provide an equivalent
reformulation in the form of a mixed-binary linear program which we solve via
column-and-constraint generation. For the online setting, where active
preference learning takes the form of a multi-stage robust optimization problem
with decision-dependent information discovery, we propose a conservative
solution approach. Numerical studies on synthetic data demonstrate that our
methods outperform state-of-the art approaches from the literature in terms of
worst-case rank, regret, and utility. We showcase how our methodology can be
used to assist a homeless services agency in choosing a policy for allocating
scarce housing resources of different types to people experiencing
homelessness.
Related papers
- An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Preference Elicitation for Offline Reinforcement Learning [59.136381500967744]
We propose Sim-OPRL, an offline preference-based reinforcement learning algorithm.
Our algorithm employs a pessimistic approach for out-of-distribution data, and an optimistic approach for acquiring informative preferences about the optimal policy.
arXiv Detail & Related papers (2024-06-26T15:59:13Z) - Optimal Baseline Corrections for Off-Policy Contextual Bandits [61.740094604552475]
We aim to learn decision policies that optimize an unbiased offline estimate of an online reward metric.
We propose a single framework built on their equivalence in learning scenarios.
Our framework enables us to characterize the variance-optimal unbiased estimator and provide a closed-form solution for it.
arXiv Detail & Related papers (2024-05-09T12:52:22Z) - Stop Relying on No-Choice and Do not Repeat the Moves: Optimal,
Efficient and Practical Algorithms for Assortment Optimization [38.57171985309975]
We develop efficient algorithms for the problem of regret in assortment selection with emphPlackett Luce (PL) based user choices.
Our methods are practical, provably optimal, and devoid of the aforementioned limitations of the existing methods.
arXiv Detail & Related papers (2024-02-29T07:17:04Z) - Multi-Objective Bayesian Optimization with Active Preference Learning [18.066263838953223]
We propose a Bayesian optimization (BO) approach to identifying the most preferred solution in a multi-objective optimization (MOO) problem.
To minimize the interaction cost with the decision maker (DM), we also propose an active learning strategy for the preference estimation.
arXiv Detail & Related papers (2023-11-22T15:24:36Z) - Dual-Directed Algorithm Design for Efficient Pure Exploration [11.492736493413103]
We consider pure-exploration problems in the context of sequential adaptive experiments with a finite set of alternative options.
We derive a sufficient condition for optimality in terms of a notion of strong convergence to the optimal allocation of samples.
Our algorithm is optimal for $epsilon$-best-arm identification and thresholding bandit problems.
arXiv Detail & Related papers (2023-10-30T07:29:17Z) - Online Learning under Budget and ROI Constraints via Weak Adaptivity [57.097119428915796]
Existing primal-dual algorithms for constrained online learning problems rely on two fundamental assumptions.
We show how such assumptions can be circumvented by endowing standard primal-dual templates with weakly adaptive regret minimizers.
We prove the first best-of-both-worlds no-regret guarantees which hold in absence of the two aforementioned assumptions.
arXiv Detail & Related papers (2023-02-02T16:30:33Z) - Efficient Learning of Decision-Making Models: A Penalty Block Coordinate
Descent Algorithm for Data-Driven Inverse Optimization [12.610576072466895]
We consider the inverse problem where we use prior decision data to uncover the underlying decision-making process.
This statistical learning problem is referred to as data-driven inverse optimization.
We propose an efficient block coordinate descent-based algorithm to solve large problem instances.
arXiv Detail & Related papers (2022-10-27T12:52:56Z) - Learning Proximal Operators to Discover Multiple Optima [66.98045013486794]
We present an end-to-end method to learn the proximal operator across non-family problems.
We show that for weakly-ized objectives and under mild conditions, the method converges globally.
arXiv Detail & Related papers (2022-01-28T05:53:28Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.