Let Me At Least Learn What You Really Like: Dealing With Noisy Humans
When Learning Preferences
- URL: http://arxiv.org/abs/2002.06288v1
- Date: Sat, 15 Feb 2020 00:36:23 GMT
- Title: Let Me At Least Learn What You Really Like: Dealing With Noisy Humans
When Learning Preferences
- Authors: Sriram Gopalakrishnan, Utkarsh Soni
- Abstract summary: We propose a modification to uncertainty sampling which uses the expected output value to help speed up learning of preferences.
We compare our approach with the uncertainty sampling baseline, as well as conduct an ablation study to test the validity of each component of our approach.
- Score: 0.76146285961466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning the preferences of a human improves the quality of the interaction
with the human. The number of queries available to learn preferences maybe
limited especially when interacting with a human, and so active learning is a
must. One approach to active learning is to use uncertainty sampling to decide
the informativeness of a query. In this paper, we propose a modification to
uncertainty sampling which uses the expected output value to help speed up
learning of preferences. We compare our approach with the uncertainty sampling
baseline, as well as conduct an ablation study to test the validity of each
component of our approach.
Related papers
- Learning Linear Utility Functions From Pairwise Comparison Queries [35.01228510505625]
We study learnability of linear utility functions from pairwise comparison queries.
We show that in the passive learning setting, linear utilities are efficiently learnable with respect to the first objective.
In this case, we show that even the second objective is efficiently learnable, and present algorithms for both the noise-free and noisy query response settings.
arXiv Detail & Related papers (2024-05-04T08:43:45Z) - Provable Benefits of Policy Learning from Human Preferences in
Contextual Bandit Problems [82.92678837778358]
preference-based methods have demonstrated substantial success in empirical applications such as InstructGPT.
We show how human bias and uncertainty in feedback modelings can affect the theoretical guarantees of these approaches.
arXiv Detail & Related papers (2023-07-24T17:50:24Z) - Boosting Feedback Efficiency of Interactive Reinforcement Learning by
Adaptive Learning from Scores [11.702616722462139]
This paper presents a new method that uses scores provided by humans instead of pairwise preferences to improve the feedback efficiency of interactive reinforcement learning.
We show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores while requiring less feedback compared to pairwise preference learning methods.
arXiv Detail & Related papers (2023-07-11T16:12:15Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Responsible Active Learning via Human-in-the-loop Peer Study [88.01358655203441]
We propose a responsible active learning method, namely Peer Study Learning (PSL), to simultaneously preserve data privacy and improve model stability.
We first introduce a human-in-the-loop teacher-student architecture to isolate unlabelled data from the task learner (teacher) on the cloud-side.
During training, the task learner instructs the light-weight active learner which then provides feedback on the active sampling criterion.
arXiv Detail & Related papers (2022-11-24T13:18:27Z) - Active Learning of Ordinal Embeddings: A User Study on Football Data [4.856635699699126]
Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function.
This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset.
arXiv Detail & Related papers (2022-07-26T07:55:23Z) - Mind Your Outliers! Investigating the Negative Impact of Outliers on
Active Learning for Visual Question Answering [71.15403434929915]
We show that across 5 models and 4 datasets on the task of visual question answering, a wide variety of active learning approaches fail to outperform random selection.
We identify the problem as collective outliers -- groups of examples that active learning methods prefer to acquire but models fail to learn.
We show that active learning sample efficiency increases significantly as the number of collective outliers in the active learning pool decreases.
arXiv Detail & Related papers (2021-07-06T00:52:11Z) - Targeted Active Learning for Bayesian Decision-Making [15.491942513739676]
We argue that when acquiring samples sequentially, separating learning and decision-making is sub-optimal.
We introduce a novel active learning strategy which takes the down-the-line decision problem into account.
Specifically, we introduce a novel active learning criterion which maximizes the expected information gain on the posterior distribution of the optimal decision.
arXiv Detail & Related papers (2021-06-08T09:05:43Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Low-Regret Active learning [64.36270166907788]
We develop an online learning algorithm for identifying unlabeled data points that are most informative for training.
At the core of our work is an efficient algorithm for sleeping experts that is tailored to achieve low regret on predictable (easy) instances.
arXiv Detail & Related papers (2021-04-06T22:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.