Related papers: Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling

Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling

URL: http://arxiv.org/abs/2310.07786v2
Date: Sat, 14 Oct 2023 20:10:12 GMT
Title: Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling
Authors: Zheqing Zhu, Yueyang Liu, Xu Kuang, Benjamin Van Roy
Abstract summary: Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends. We introduce a novel non-stationary contextual bandit algorithm that addresses these concerns. It combines a scalable, deep-neural-network-based architecture with a carefully designed exploration mechanism.
Score: 15.88678122212934
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends. While a number of non-stationary contextual bandit learning algorithms have been proposed in the literature, they excessively explore due to a lack of prioritization for information of enduring value, or are designed in ways that do not scale in modern applications with high-dimensional user-specific features and large action set, or both. In this paper, we introduce a novel non-stationary contextual bandit algorithm that addresses these concerns. It combines a scalable, deep-neural-network-based architecture with a carefully designed exploration mechanism that strategically prioritizes collecting information with the most lasting value in a non-stationary environment. Through empirical evaluations on two real-world recommendation datasets, which exhibit pronounced non-stationarity, we demonstrate that our approach significantly outperforms the state-of-the-art baselines.

Related papers

Certified Neural Approximations of Nonlinear Dynamics [52.79163248326912]
In safety-critical contexts, the use of neural approximations requires formal bounds on their closeness to the underlying system.<n>We propose a novel, adaptive, and parallelizable verification method based on certified first-order models.
arXiv Detail & Related papers (2025-05-21T13:22:20Z)
Demystifying Online Clustering of Bandits: Enhanced Exploration Under Stochastic and Smoothed Adversarial Contexts [27.62165569135504]
A line of research, known as online clustering of bandits, extends contextual MAB by grouping similar users into clusters. Existing algorithms, which rely on the upper confidence bound (UCB) strategy, struggle to gather adequate statistical information to accurately identify unknown user clusters. We propose two novel algorithms, UniCLUB and PhaseUniCLUB, which incorporate enhanced exploration mechanisms to accelerate cluster identification.
arXiv Detail & Related papers (2025-01-01T16:38:29Z)
Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation. Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy. Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z)
Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts [31.33919659549256]
We present a novel contextual bandit problem with post-serving contexts. Our algorithm, poLinUCB, achieves tight regret under standard assumptions. Extensive empirical tests on both synthetic and real-world datasets demonstrate the significant benefit of utilizing post-serving contexts.
arXiv Detail & Related papers (2023-09-25T06:22:28Z)
Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data. The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships. A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z)
Online learning in bandits with predicted context [8.257280652461159]
We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context. This setting is motivated by a wide range of applications where the true context for decision-making is unobserved. We propose the first online algorithm in this setting with sublinear regret guarantees under mild conditions.
arXiv Detail & Related papers (2023-07-26T02:33:54Z)
Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision. Existing literature addresses this challenge by employing local-based representation approaches. This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z)
An Empirical Evaluation of Federated Contextual Bandit Algorithms [27.275089644378376]
Federated learning can be done using implicit signals generated as users interact with applications of interest. We develop variants of prominent contextual bandit algorithms from the centralized seting for the federated setting. Our experiments reveal the surprising effectiveness of the simple and commonly used softmax in balancing the well-know exploration-exploitation tradeoff.
arXiv Detail & Related papers (2023-03-17T19:22:30Z)
Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model. A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations. We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z)
Top-K Ranking Deep Contextual Bandits for Information Selection Systems [0.0]
We propose a novel approach to top-K rankings under the contextual multi-armed bandit framework. We model the reward function with a neural network to allow non-linear approximation to learn the relationship between rewards and contexts.
arXiv Detail & Related papers (2022-01-28T15:10:44Z)
Temporal Predictive Coding For Model-Based Planning In Latent Space [80.99554006174093]
We present an information-theoretic approach that employs temporal predictive coding to encode elements in the environment that can be predicted across time. We evaluate our model on a challenging modification of standard DMControl tasks where the background is replaced with natural videos that contain complex but irrelevant information to the planning task.
arXiv Detail & Related papers (2021-06-14T04:31:15Z)
Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits [9.877980800275507]
We propose an approach that learns to represent the relevant context for a decision based solely on the raw history of interactions between the agent and the environment. This approach relies on a combination of features extracted by recurrent neural networks with a contextual linear bandit algorithm based on posterior sampling.
arXiv Detail & Related papers (2020-07-09T12:46:51Z)
Seismic horizon detection with neural networks [62.997667081978825]
This paper is an open-sourced research of applying binary segmentation approach to the task of horizon detection on multiple real seismic cubes with a focus on inter-cube generalization of the predictive model. The main contribution of this paper is an open-sourced research of applying binary segmentation approach to the task of horizon detection on multiple real seismic cubes with a focus on inter-cube generalization of the predictive model.
arXiv Detail & Related papers (2020-01-10T11:30:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.