Active Preference Learning using Maximum Regret
- URL: http://arxiv.org/abs/2005.04067v2
- Date: Mon, 28 Sep 2020 19:27:27 GMT
- Title: Active Preference Learning using Maximum Regret
- Authors: Nils Wilde, Dana Kulic, and Stephen L. Smith
- Abstract summary: We study active preference learning as a framework for intuitively specifying the behaviour of autonomous robots.
In active preference learning, a user chooses the preferred behaviour from a set of alternatives, from which the robot learns the user's preferences.
- Score: 10.317601896290467
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study active preference learning as a framework for intuitively specifying
the behaviour of autonomous robots. In active preference learning, a user
chooses the preferred behaviour from a set of alternatives, from which the
robot learns the user's preferences, modeled as a parameterized cost function.
Previous approaches present users with alternatives that minimize the
uncertainty over the parameters of the cost function. However, different
parameters might lead to the same optimal behaviour; as a consequence the
solution space is more structured than the parameter space. We exploit this by
proposing a query selection that greedily reduces the maximum error ratio over
the solution space. In simulations we demonstrate that the proposed approach
outperforms other state of the art techniques in both learning efficiency and
ease of queries for the user. Finally, we show that evaluating the learning
based on the similarities of solutions instead of the similarities of weights
allows for better predictions for different scenarios.
Related papers
- Learning Joint Models of Prediction and Optimization [56.04498536842065]
Predict-Then-Then framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving.
This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by joint predictive models.
arXiv Detail & Related papers (2024-09-07T19:52:14Z) - An incremental preference elicitation-based approach to learning potentially non-monotonic preferences in multi-criteria sorting [53.36437745983783]
We first construct a max-margin optimization-based model to model potentially non-monotonic preferences.
We devise information amount measurement methods and question selection strategies to pinpoint the most informative alternative in each iteration.
Two incremental preference elicitation-based algorithms are developed to learn potentially non-monotonic preferences.
arXiv Detail & Related papers (2024-09-04T14:36:20Z) - Adaptive Preference Scaling for Reinforcement Learning with Human Feedback [103.36048042664768]
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values.
We propose a novel adaptive preference loss, underpinned by distributionally robust optimization (DRO)
Our method is versatile and can be readily adapted to various preference optimization frameworks.
arXiv Detail & Related papers (2024-06-04T20:33:22Z) - End-to-End Learning for Fair Multiobjective Optimization Under
Uncertainty [55.04219793298687]
The Predict-Then-Forecast (PtO) paradigm in machine learning aims to maximize downstream decision quality.
This paper extends the PtO methodology to optimization problems with nondifferentiable Ordered Weighted Averaging (OWA) objectives.
It shows how optimization of OWA functions can be effectively integrated with parametric prediction for fair and robust optimization under uncertainty.
arXiv Detail & Related papers (2024-02-12T16:33:35Z) - Predict-Then-Optimize by Proxy: Learning Joint Models of Prediction and
Optimization [59.386153202037086]
Predict-Then- framework uses machine learning models to predict unknown parameters of an optimization problem from features before solving.
This approach can be inefficient and requires handcrafted, problem-specific rules for backpropagation through the optimization step.
This paper proposes an alternative method, in which optimal solutions are learned directly from the observable features by predictive models.
arXiv Detail & Related papers (2023-11-22T01:32:06Z) - Experience in Engineering Complex Systems: Active Preference Learning
with Multiple Outcomes and Certainty Levels [1.5257326975704795]
Black-box optimization refers to the problem whose objective function and/or constraint sets are either unknown, inaccessible, or non-existent.
The algorithm so-called Active Preference Learning has been developed to exploit this specific information.
Our approach aims to extend the algorithm in such a way that can exploit further information effectively.
arXiv Detail & Related papers (2023-02-27T15:55:37Z) - Regret Bounds and Experimental Design for Estimate-then-Optimize [9.340611077939828]
In practical applications, data is used to make decisions in two steps: estimation and optimization.
Errors in the estimation step can lead estimate-then-optimize to sub-optimal decisions.
We provide a novel bound on this regret for smooth and unconstrained optimization problems.
arXiv Detail & Related papers (2022-10-27T16:13:48Z) - The Parametric Cost Function Approximation: A new approach for
multistage stochastic programming [4.847980206213335]
We show that a parameterized version of a deterministic optimization model can be an effective way of handling uncertainty without the complexity of either programming or dynamic programming.
This approach can handle complex, high-dimensional state variables, and avoids the usual approximations associated with scenario trees or value function approximations.
arXiv Detail & Related papers (2022-01-01T23:25:09Z) - Low-Cost Algorithmic Recourse for Users With Uncertain Cost Functions [74.00030431081751]
We formalize the notion of user-specific cost functions and introduce a new method for identifying actionable recourses for users.
Our method satisfies up to 25.89 percentage points more users compared to strong baseline methods.
arXiv Detail & Related papers (2021-11-01T19:49:35Z) - Learning Choice Functions via Pareto-Embeddings [3.1410342959104725]
We consider the problem of learning to choose from a given set of objects, where each object is represented by a feature vector.
We propose a learning algorithm that minimizes a differentiable loss function suitable for this task.
arXiv Detail & Related papers (2020-07-14T09:34:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.