Overcoming Prior Misspecification in Online Learning to Rank
- URL: http://arxiv.org/abs/2301.10651v2
- Date: Thu, 26 Jan 2023 17:30:22 GMT
- Title: Overcoming Prior Misspecification in Online Learning to Rank
- Authors: Javad Azizi, Ofer Meshi, Masrour Zoghi, Maryam Karimzadehgan
- Abstract summary: We propose and analyze adaptive algorithms that address the requirement for the prior to match the true prior.
We also consider scalar relevance feedback on top of click feedback.
We demonstrate the efficacy of our algorithms using both synthetic and real-world experiments.
- Score: 4.665041704405341
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent literature on online learning to rank (LTR) has established the
utility of prior knowledge to Bayesian ranking bandit algorithms. However, a
major limitation of existing work is the requirement for the prior used by the
algorithm to match the true prior. In this paper, we propose and analyze
adaptive algorithms that address this issue and additionally extend these
results to the linear and generalized linear models. We also consider scalar
relevance feedback on top of click feedback. Moreover, we demonstrate the
efficacy of our algorithms using both synthetic and real-world experiments.
Related papers
- A Simple Learning-Augmented Algorithm for Online Packing with Concave Objectives [4.9826534303287335]
We introduce and analyze a simple learning-augmented algorithm for online packing problems with linear constraints and concave objectives.
We further raise the problem of understanding necessary and sufficient conditions for when such simple black-box solutions may be optimal.
arXiv Detail & Related papers (2024-06-05T18:39:28Z) - Learning-Augmented Algorithms with Explicit Predictors [67.02156211760415]
Recent advances in algorithmic design show how to utilize predictions obtained by machine learning models from past and present data.
Prior research in this context was focused on a paradigm where the predictor is pre-trained on past data and then used as a black box.
In this work, we unpack the predictor and integrate the learning problem it gives rise for within the algorithmic challenge.
arXiv Detail & Related papers (2024-03-12T08:40:21Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Adversarial Online Collaborative Filtering [20.931533714651376]
We investigate the problem of online collaborative filtering under no-repetition constraints.
We design and analyze an algorithm that works under biclustering assumptions on the user-item preference matrix.
We show that this algorithm exhibits an optimal regret guarantee, while being fully adaptive.
arXiv Detail & Related papers (2023-02-11T19:30:55Z) - Minimalistic Predictions to Schedule Jobs with Online Precedence
Constraints [117.8317521974783]
We consider non-clairvoyant scheduling with online precedence constraints.
An algorithm is oblivious to any job dependencies and learns about a job only if all of its predecessors have been completed.
arXiv Detail & Related papers (2023-01-30T13:17:15Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Tree-Based Adaptive Model Learning [62.997667081978825]
We extend the Kearns-Vazirani learning algorithm to handle systems that change over time.
We present a new learning algorithm that can reuse and update previously learned behavior, implement it in the LearnLib library, and evaluate it on large examples.
arXiv Detail & Related papers (2022-08-31T21:24:22Z) - Non-Clairvoyant Scheduling with Predictions Revisited [77.86290991564829]
In non-clairvoyant scheduling, the task is to find an online strategy for scheduling jobs with a priori unknown processing requirements.
We revisit this well-studied problem in a recently popular learning-augmented setting that integrates (untrusted) predictions in algorithm design.
We show that these predictions have desired properties, admit a natural error measure as well as algorithms with strong performance guarantees.
arXiv Detail & Related papers (2022-02-21T13:18:11Z) - Metalearning Linear Bandits by Prior Update [7.519872646378836]
Fully Bayesian approaches assume that problem parameters are generated from a known prior, while in practice, such information is often lacking.
This problem is exacerbated in decision-making setups with partial information, where using a misspecified prior may lead to poor exploration and inferior performance.
In this work we prove, in the context of linear bandits and Gaussian priors, that as long as the prior estimate is sufficiently close to the true prior, the performance of an algorithm that uses the misspecified prior is close to that of an algorithm that uses the true prior.
arXiv Detail & Related papers (2021-07-12T11:17:01Z) - The Primal-Dual method for Learning Augmented Algorithms [10.2730668356857]
We extend the primal-dual method for online algorithms to incorporate predictions that advise the online algorithm about the next action to take.
We show that our algorithms outperform any online algorithm when the prediction is accurate while maintaining good guarantees when the prediction is misleading.
arXiv Detail & Related papers (2020-10-22T11:58:47Z) - Delay-Adaptive Learning in Generalized Linear Contextual Bandits [18.68458152442088]
We study the performance of two well-known algorithms adapted to a delayed setting.
We describe modifications on how these two algorithms should be adapted to handle delays.
Our results contribute to the broad landscape of contextual bandits literature.
arXiv Detail & Related papers (2020-03-11T09:12:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.