Follow-ups Also Matter: Improving Contextual Bandits via Post-serving
Contexts
- URL: http://arxiv.org/abs/2309.13896v1
- Date: Mon, 25 Sep 2023 06:22:28 GMT
- Title: Follow-ups Also Matter: Improving Contextual Bandits via Post-serving
Contexts
- Authors: Chaoqi Wang, Ziyu Ye, Zhe Feng, Ashwinkumar Badanidiyuru, Haifeng Xu
- Abstract summary: We present a novel contextual bandit problem with post-serving contexts.
Our algorithm, poLinUCB, achieves tight regret under standard assumptions.
Extensive empirical tests on both synthetic and real-world datasets demonstrate the significant benefit of utilizing post-serving contexts.
- Score: 31.33919659549256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Standard contextual bandit problem assumes that all the relevant contexts are
observed before the algorithm chooses an arm. This modeling paradigm, while
useful, often falls short when dealing with problems in which valuable
additional context can be observed after arm selection. For example, content
recommendation platforms like Youtube, Instagram, Tiktok also observe valuable
follow-up information pertinent to the user's reward after recommendation
(e.g., how long the user stayed, what is the user's watch speed, etc.). To
improve online learning efficiency in these applications, we study a novel
contextual bandit problem with post-serving contexts and design a new
algorithm, poLinUCB, that achieves tight regret under standard assumptions.
Core to our technical proof is a robustified and generalized version of the
well-known Elliptical Potential Lemma (EPL), which can accommodate noise in
data. Such robustification is necessary for tackling our problem, and we
believe it could also be of general interest. Extensive empirical tests on both
synthetic and real-world datasets demonstrate the significant benefit of
utilizing post-serving contexts as well as the superior performance of our
algorithm over the state-of-the-art approaches.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Large Language Models for Next Point-of-Interest Recommendation [53.93503291553005]
Location-Based Social Network (LBSN) data is often used for the next Point of Interest (POI) recommendation task.
One frequently disregarded challenge is how to effectively use the abundant contextual information present in LBSN data.
We propose a framework that uses pretrained Large Language Models (LLMs) to tackle this challenge.
arXiv Detail & Related papers (2024-04-19T13:28:36Z) - Robust and Scalable Model Editing for Large Language Models [75.95623066605259]
We propose EREN (Edit models by REading Notes) to improve the scalability and robustness of LLM editing.
Unlike existing techniques, it can integrate knowledge from multiple edits, and correctly respond to syntactically similar but semantically unrelated inputs.
arXiv Detail & Related papers (2024-03-26T06:57:23Z) - Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble
Sampling [15.88678122212934]
Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends.
We introduce a novel non-stationary contextual bandit algorithm that addresses these concerns.
It combines a scalable, deep-neural-network-based architecture with a carefully designed exploration mechanism.
arXiv Detail & Related papers (2023-10-11T18:15:55Z) - Leveraging User-Triggered Supervision in Contextual Bandits [34.58466163463977]
We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context.
We develop a new framework to leverage such signals, while being robust to their biased nature.
arXiv Detail & Related papers (2023-02-07T22:42:27Z) - Improving Sequential Query Recommendation with Immediate User Feedback [6.925738064847176]
We propose an algorithm for next query recommendation in interactive data exploration settings.
We conduct a large-scale experimental study using log files from a popular online literature discovery service.
arXiv Detail & Related papers (2022-05-12T18:19:24Z) - Measuring and Increasing Context Usage in Context-Aware Machine
Translation [64.5726087590283]
We introduce a new metric, conditional cross-mutual information, to quantify the usage of context by machine translation models.
We then introduce a new, simple training method, context-aware word dropout, to increase the usage of context by context-aware models.
arXiv Detail & Related papers (2021-05-07T19:55:35Z) - Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.
We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling.
We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z) - Beyond UCB: Optimal and Efficient Contextual Bandits with Regression
Oracles [112.89548995091182]
We provide the first universal and optimal reduction from contextual bandits to online regression.
Our algorithm requires no distributional assumptions beyond realizability, and works even when contexts are chosen adversarially.
arXiv Detail & Related papers (2020-02-12T11:33:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.