Follow-ups Also Matter: Improving Contextual Bandits via Post-serving
Contexts
- URL: http://arxiv.org/abs/2309.13896v1
- Date: Mon, 25 Sep 2023 06:22:28 GMT
- Title: Follow-ups Also Matter: Improving Contextual Bandits via Post-serving
Contexts
- Authors: Chaoqi Wang, Ziyu Ye, Zhe Feng, Ashwinkumar Badanidiyuru, Haifeng Xu
- Abstract summary: We present a novel contextual bandit problem with post-serving contexts.
Our algorithm, poLinUCB, achieves tight regret under standard assumptions.
Extensive empirical tests on both synthetic and real-world datasets demonstrate the significant benefit of utilizing post-serving contexts.
- Score: 31.33919659549256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Standard contextual bandit problem assumes that all the relevant contexts are
observed before the algorithm chooses an arm. This modeling paradigm, while
useful, often falls short when dealing with problems in which valuable
additional context can be observed after arm selection. For example, content
recommendation platforms like Youtube, Instagram, Tiktok also observe valuable
follow-up information pertinent to the user's reward after recommendation
(e.g., how long the user stayed, what is the user's watch speed, etc.). To
improve online learning efficiency in these applications, we study a novel
contextual bandit problem with post-serving contexts and design a new
algorithm, poLinUCB, that achieves tight regret under standard assumptions.
Core to our technical proof is a robustified and generalized version of the
well-known Elliptical Potential Lemma (EPL), which can accommodate noise in
data. Such robustification is necessary for tackling our problem, and we
believe it could also be of general interest. Extensive empirical tests on both
synthetic and real-world datasets demonstrate the significant benefit of
utilizing post-serving contexts as well as the superior performance of our
algorithm over the state-of-the-art approaches.
Related papers
- PageRank Bandits for Link Prediction [72.61386754332776]
Link prediction is a critical problem in graph learning with broad applications such as recommender systems and knowledge graph completion.
This paper reformulates link prediction as a sequential decision-making process, where each link prediction interaction occurs sequentially.
We propose a novel fusion algorithm, PRB (PageRank Bandits), which is the first to combine contextual bandits with PageRank for collaborative exploitation and exploration.
arXiv Detail & Related papers (2024-11-03T02:39:28Z) - Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance [68.56701216210617]
In-principle, one would expect models to adapt to the user context better after instruction finetuning.
We observe a surprising failure mode: during instruction tuning, the context reliance under knowledge conflicts initially increases as expected, but then gradually decreases.
arXiv Detail & Related papers (2024-10-14T17:57:09Z) - Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation.
Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy.
Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z) - Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Large Language Models for Next Point-of-Interest Recommendation [53.93503291553005]
Location-Based Social Network (LBSN) data is often used for the next Point of Interest (POI) recommendation task.
One frequently disregarded challenge is how to effectively use the abundant contextual information present in LBSN data.
We propose a framework that uses pretrained Large Language Models (LLMs) to tackle this challenge.
arXiv Detail & Related papers (2024-04-19T13:28:36Z) - Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble
Sampling [15.88678122212934]
Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends.
We introduce a novel non-stationary contextual bandit algorithm that addresses these concerns.
It combines a scalable, deep-neural-network-based architecture with a carefully designed exploration mechanism.
arXiv Detail & Related papers (2023-10-11T18:15:55Z) - Leveraging User-Triggered Supervision in Contextual Bandits [34.58466163463977]
We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context.
We develop a new framework to leverage such signals, while being robust to their biased nature.
arXiv Detail & Related papers (2023-02-07T22:42:27Z) - Improving Sequential Query Recommendation with Immediate User Feedback [6.925738064847176]
We propose an algorithm for next query recommendation in interactive data exploration settings.
We conduct a large-scale experimental study using log files from a popular online literature discovery service.
arXiv Detail & Related papers (2022-05-12T18:19:24Z) - Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.
We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling.
We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.