MultiScale Contextual Bandits for Long Term Objectives
- URL: http://arxiv.org/abs/2503.17674v2
- Date: Wed, 28 May 2025 07:12:09 GMT
- Title: MultiScale Contextual Bandits for Long Term Objectives
- Authors: Richa Rastogi, Yuta Saito, Thorsten Joachims,
- Abstract summary: We introduce the framework of MultiScale Policy Learning to contextually reconcile AI systems need to act and optimize feedback at multiple timescales.<n>We show how the lower timescales with more plentiful data can provide a data-dependent hierarchical prior for faster learning at higher scales.
- Score: 36.85989221657821
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The feedback that AI systems (e.g., recommender systems, chatbots) collect from user interactions is a crucial source of training data. While short-term feedback (e.g., clicks, engagement) is widely used for training, there is ample evidence that optimizing short-term feedback does not necessarily achieve the desired long-term objectives. Unfortunately, directly optimizing for long-term objectives is challenging, and we identify the disconnect in the timescales of short-term interventions (e.g., rankings) and the long-term feedback (e.g., user retention) as one of the key obstacles. To overcome this disconnect, we introduce the framework of MultiScale Policy Learning to contextually reconcile that AI systems need to act and optimize feedback at multiple interdependent timescales. Following a PAC-Bayes motivation, we show how the lower timescales with more plentiful data can provide a data-dependent hierarchical prior for faster learning at higher scales, where data is more scarce. As a result, the policies at all levels effectively optimize for the long-term. We instantiate the framework with MultiScale Off-Policy Bandit Learning (MSBL) and demonstrate its effectiveness on three tasks relating to recommender and conversational systems.
Related papers
- Reducing Distraction in Long-Context Language Models by Focused Learning [6.803882766744194]
We propose a novel training method that enhances Large Language Models' ability to discern relevant information.
During fine-tuning with long contexts, we employ a retriever to extract the most relevant segments.
We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned.
arXiv Detail & Related papers (2024-11-08T19:27:42Z) - Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation.
Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy.
Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z) - Strike the Balance: On-the-Fly Uncertainty based User Interactions for Long-Term Video Object Segmentation [23.417370317522106]
We introduce a variant of video object segmentation (VOS) that bridges interactive and semi-automatic approaches.
We aim to maximize the tracking duration of an object of interest, while requiring minimal user corrections to maintain tracking over an extended period.
We evaluate our approach using the recently introduced LVOS dataset, which offers numerous long-term videos.
arXiv Detail & Related papers (2024-07-31T21:42:42Z) - A federated large language model for long-term time series forecasting [4.696083734269233]
We propose FedTime, a federated large language model (LLM) tailored for long-range time series prediction.
We employ K-means clustering to partition edge devices or clients into distinct clusters.
We also incorporate channel independence and patching to better preserve local semantic information.
arXiv Detail & Related papers (2024-07-30T02:38:27Z) - Effective Long-Context Scaling of Foundation Models [90.57254298730923]
We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens.
Our models achieve consistent improvements on most regular tasks and significant improvements on long-context tasks over Llama 2.
arXiv Detail & Related papers (2023-09-27T21:41:49Z) - Sequential Search with Off-Policy Reinforcement Learning [48.88165680363482]
We propose a highly scalable hybrid learning model that consists of an RNN learning framework and an attention model.
As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly.
We also explore the use of off-policy reinforcement learning in multi-session personalized search ranking.
arXiv Detail & Related papers (2022-02-01T06:52:40Z) - Dynamic Memory based Attention Network for Sequential Recommendation [79.5901228623551]
We propose a novel long sequential recommendation model called Dynamic Memory-based Attention Network (DMAN)
It segments the overall long behavior sequence into a series of sub-sequences, then trains the model and maintains a set of memory blocks to preserve long-term interests of users.
Based on the dynamic memory, the user's short-term and long-term interests can be explicitly extracted and combined for efficient joint recommendation.
arXiv Detail & Related papers (2021-02-18T11:08:54Z) - Dynamic Embeddings for Interaction Prediction [2.5758502140236024]
In recommender systems (RSs), predicting the next item that a user interacts with is critical for user retention.
Recent studies have shown the effectiveness of modeling the mutual interactions between users and items using separate user and item embeddings.
We propose a novel method called DeePRed that addresses some of their limitations.
arXiv Detail & Related papers (2020-11-10T16:04:46Z) - Learning Long-term Visual Dynamics with Region Proposal Interaction
Networks [75.06423516419862]
We build object representations that can capture inter-object and object-environment interactions over a long-range.
Thanks to the simple yet effective object representation, our approach outperforms prior methods by a significant margin.
arXiv Detail & Related papers (2020-08-05T17:48:00Z) - Modeling Long-Term and Short-Term Interests with Parallel Attentions for
Session-based Recommendation [17.092823992007794]
Session-based recommenders typically explore the users' evolving interests.
Recent advances in attention mechanisms have led to state-of-the-art methods for solving this task.
We propose a novel Parallel Attention Network model (PAN) for Session-based Recommendation.
arXiv Detail & Related papers (2020-06-27T11:47:51Z) - Sequential Recommender via Time-aware Attentive Memory Network [67.26862011527986]
We propose a temporal gating methodology to improve attention mechanism and recurrent units.
We also propose a Multi-hop Time-aware Attentive Memory network to integrate long-term and short-term preferences.
Our approach is scalable for candidate retrieval tasks and can be viewed as a non-linear generalization of latent factorization for dot-product based Top-K recommendation.
arXiv Detail & Related papers (2020-05-18T11:29:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.