Related papers: Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts

Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts

URL: http://arxiv.org/abs/2309.13896v1
Date: Mon, 25 Sep 2023 06:22:28 GMT
Title: Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts
Authors: Chaoqi Wang, Ziyu Ye, Zhe Feng, Ashwinkumar Badanidiyuru, Haifeng Xu
Abstract summary: We present a novel contextual bandit problem with post-serving contexts. Our algorithm, poLinUCB, achieves tight regret under standard assumptions. Extensive empirical tests on both synthetic and real-world datasets demonstrate the significant benefit of utilizing post-serving contexts.
Score: 31.33919659549256
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Standard contextual bandit problem assumes that all the relevant contexts are observed before the algorithm chooses an arm. This modeling paradigm, while useful, often falls short when dealing with problems in which valuable additional context can be observed after arm selection. For example, content recommendation platforms like Youtube, Instagram, Tiktok also observe valuable follow-up information pertinent to the user's reward after recommendation (e.g., how long the user stayed, what is the user's watch speed, etc.). To improve online learning efficiency in these applications, we study a novel contextual bandit problem with post-serving contexts and design a new algorithm, poLinUCB, that achieves tight regret under standard assumptions. Core to our technical proof is a robustified and generalized version of the well-known Elliptical Potential Lemma (EPL), which can accommodate noise in data. Such robustification is necessary for tackling our problem, and we believe it could also be of general interest. Extensive empirical tests on both synthetic and real-world datasets demonstrate the significant benefit of utilizing post-serving contexts as well as the superior performance of our algorithm over the state-of-the-art approaches.

Related papers

PageRank Bandits for Link Prediction [72.61386754332776]
Link prediction is a critical problem in graph learning with broad applications such as recommender systems and knowledge graph completion. This paper reformulates link prediction as a sequential decision-making process, where each link prediction interaction occurs sequentially. We propose a novel fusion algorithm, PRB (PageRank Bandits), which is the first to combine contextual bandits with PageRank for collaborative exploitation and exploration.
arXiv Detail & Related papers (2024-11-03T02:39:28Z)
Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance [68.56701216210617]
In-principle, one would expect models to adapt to the user context better after instruction finetuning. We observe a surprising failure mode: during instruction tuning, the context reliance under knowledge conflicts initially increases as expected, but then gradually decreases.
arXiv Detail & Related papers (2024-10-14T17:57:09Z)
Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation. Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy. Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Large Language Models for Next Point-of-Interest Recommendation [53.93503291553005]
Location-Based Social Network (LBSN) data is often used for the next Point of Interest (POI) recommendation task. One frequently disregarded challenge is how to effectively use the abundant contextual information present in LBSN data. We propose a framework that uses pretrained Large Language Models (LLMs) to tackle this challenge.
arXiv Detail & Related papers (2024-04-19T13:28:36Z)
Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling [15.88678122212934]
Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends. We introduce a novel non-stationary contextual bandit algorithm that addresses these concerns. It combines a scalable, deep-neural-network-based architecture with a carefully designed exploration mechanism.
arXiv Detail & Related papers (2023-10-11T18:15:55Z)
Leveraging User-Triggered Supervision in Contextual Bandits [34.58466163463977]
We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context. We develop a new framework to leverage such signals, while being robust to their biased nature.
arXiv Detail & Related papers (2023-02-07T22:42:27Z)
Improving Sequential Query Recommendation with Immediate User Feedback [6.925738064847176]
We propose an algorithm for next query recommendation in interactive data exploration settings. We conduct a large-scale experimental study using log files from a popular online literature discovery service.
arXiv Detail & Related papers (2022-05-12T18:19:24Z)
Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state. We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling. We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.