Active Learning for Contextual Search with Binary Feedbacks
- URL: http://arxiv.org/abs/2110.01072v1
- Date: Sun, 3 Oct 2021 19:05:29 GMT
- Title: Active Learning for Contextual Search with Binary Feedbacks
- Authors: Chen, Xi and Liu, Quanquan and Wang, Yining
- Abstract summary: We study the learning problem in contextual search motivated by applications such as first-price auction.
We propose a tri-section search approach combined with a margin-based active learning method.
- Score: 2.6424064030995957
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study the learning problem in contextual search, which is
motivated by applications such as first-price auction, personalized medicine
experiments, and feature-based pricing experiments. In particular, for a
sequence of arriving context vectors, with each context associated with an
underlying value, the decision-maker either makes a query at a certain point or
skips the context. The decision-maker will only observe the binary feedback on
the relationship between the query point and the value associated with the
context. We study a PAC learning setting, where the goal is to learn the
underlying mean value function in context with a minimum number of queries. To
address this challenge, we propose a tri-section search approach combined with
a margin-based active learning method. We show that the algorithm only needs to
make $O(1/\varepsilon^2)$ queries to achieve an $\epsilon$-estimation accuracy.
This sample complexity significantly reduces the required sample complexity in
the passive setting, at least $\Omega(1/\varepsilon^4)$.
Related papers
- FLARE: Faithful Logic-Aided Reasoning and Exploration [50.9814063216852]
We introduce a novel approach for traversing the problem space using task decompositions.
We use the Large Language Models to plan a solution, soft-formalise the query into facts and predicates using a logic programming code.
Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers.
arXiv Detail & Related papers (2024-10-14T19:39:11Z) - Learning Thresholds with Latent Values and Censored Feedback [18.129896050051432]
We show a problem where the unknown reward $g(gamma, v)$ depends on the proposed threshold $gamma$ and latent value $v$ and it can be $only$ achieved if the threshold is lower than or equal to the unknown latent value.
This problem has broad applications in practical scenarios, e.g., reserve price optimization in online auctions, online task assignments in crowdsourcing, setting recruiting bars in hiring.
arXiv Detail & Related papers (2023-12-07T19:30:08Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Contextual Bandits and Imitation Learning via Preference-Based Active
Queries [17.73844193143454]
We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward.
Instead, the learner can actively query an expert at each round to compare two actions and receive noisy preference feedback.
The learner's objective is two-fold: to minimize the regret associated with the executed actions, while simultaneously, minimizing the number of comparison queries made to the expert.
arXiv Detail & Related papers (2023-07-24T16:36:04Z) - Information-Computation Tradeoffs for Learning Margin Halfspaces with
Random Classification Noise [50.64137465792738]
We study the problem of PAC $gamma$-margin halfspaces with Random Classification Noise.
We establish an information-computation tradeoff suggesting an inherent gap between the sample complexity of the problem and the sample complexity of computationally efficient algorithms.
arXiv Detail & Related papers (2023-06-28T16:33:39Z) - A Few Expert Queries Suffices for Sample-Efficient RL with Resets and
Linear Value Approximation [16.29514743112387]
We study sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearlyrealizable.
We present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries.
Delphi requires $tildemathcalO(d)$ expert queries and a $textttpoly(d,|mathcalA|,1/varepsilon)$ amount of exploratory samples to provably recover an $varepsilon$suboptimal policy.
arXiv Detail & Related papers (2022-07-18T01:39:13Z) - QRelScore: Better Evaluating Generated Questions with Deeper
Understanding of Context-aware Relevance [54.48031346496593]
We propose $textbfQRelScore$, a context-aware evaluation metric for $underlinetextbfRel$evance evaluation metric.
Based on off-the-shelf language models such as BERT and GPT2, QRelScore employs both word-level hierarchical matching and sentence-level prompt-based generation.
Compared with existing metrics, our experiments demonstrate that QRelScore is able to achieve a higher correlation with human judgments while being much more robust to adversarial samples.
arXiv Detail & Related papers (2022-04-29T07:39:53Z) - Teaching an Active Learner with Contrastive Examples [35.926575235046634]
We study the problem of active learning with the added twist that the learner is assisted by a helpful teacher.
We investigate an efficient teaching algorithm that adaptively picks contrastive examples.
We derive strong performance guarantees for our algorithm based on two problem-dependent parameters.
arXiv Detail & Related papers (2021-10-28T05:00:55Z) - How to Query An Oracle? Efficient Strategies to Label Data [59.89900843097016]
We consider the basic problem of querying an expert oracle for labeling a dataset in machine learning.
We present a randomized batch algorithm that operates on a round-by-round basis to label the samples and achieves a query rate of $O(fracNk2)$.
In addition, we present an adaptive greedy query scheme, which achieves an average rate of $approx 0.2N$ queries per sample with triplet queries.
arXiv Detail & Related papers (2021-10-05T20:15:35Z) - A Provably Efficient Sample Collection Strategy for Reinforcement
Learning [123.69175280309226]
One of the challenges in online reinforcement learning (RL) is that the agent needs to trade off the exploration of the environment and the exploitation of the samples to optimize its behavior.
We propose to tackle the exploration-exploitation problem following a decoupled approach composed of: 1) An "objective-specific" algorithm that prescribes how many samples to collect at which states, as if it has access to a generative model (i.e., sparse simulator of the environment); 2) An "objective-agnostic" sample collection responsible for generating the prescribed samples as fast as possible.
arXiv Detail & Related papers (2020-07-13T15:17:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.