Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble
Sampling
- URL: http://arxiv.org/abs/2310.07786v2
- Date: Sat, 14 Oct 2023 20:10:12 GMT
- Title: Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble
Sampling
- Authors: Zheqing Zhu, Yueyang Liu, Xu Kuang, Benjamin Van Roy
- Abstract summary: Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends.
We introduce a novel non-stationary contextual bandit algorithm that addresses these concerns.
It combines a scalable, deep-neural-network-based architecture with a carefully designed exploration mechanism.
- Score: 15.88678122212934
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-world applications of contextual bandits often exhibit non-stationarity
due to seasonality, serendipity, and evolving social trends. While a number of
non-stationary contextual bandit learning algorithms have been proposed in the
literature, they excessively explore due to a lack of prioritization for
information of enduring value, or are designed in ways that do not scale in
modern applications with high-dimensional user-specific features and large
action set, or both. In this paper, we introduce a novel non-stationary
contextual bandit algorithm that addresses these concerns. It combines a
scalable, deep-neural-network-based architecture with a carefully designed
exploration mechanism that strategically prioritizes collecting information
with the most lasting value in a non-stationary environment. Through empirical
evaluations on two real-world recommendation datasets, which exhibit pronounced
non-stationarity, we demonstrate that our approach significantly outperforms
the state-of-the-art baselines.
Related papers
- Hierarchical Reinforcement Learning for Temporal Abstraction of Listwise Recommendation [51.06031200728449]
We propose a novel framework called mccHRL to provide different levels of temporal abstraction on listwise recommendation.
Within the hierarchical framework, the high-level agent studies the evolution of user perception, while the low-level agent produces the item selection policy.
Results observe significant performance improvement by our method, compared with several well-known baselines.
arXiv Detail & Related papers (2024-09-11T17:01:06Z) - Follow-ups Also Matter: Improving Contextual Bandits via Post-serving
Contexts [31.33919659549256]
We present a novel contextual bandit problem with post-serving contexts.
Our algorithm, poLinUCB, achieves tight regret under standard assumptions.
Extensive empirical tests on both synthetic and real-world datasets demonstrate the significant benefit of utilizing post-serving contexts.
arXiv Detail & Related papers (2023-09-25T06:22:28Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Online learning in bandits with predicted context [8.257280652461159]
We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context.
This setting is motivated by a wide range of applications where the true context for decision-making is unobserved.
We propose the first online algorithm in this setting with sublinear regret guarantees under mild conditions.
arXiv Detail & Related papers (2023-07-26T02:33:54Z) - Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual
Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision.
Existing literature addresses this challenge by employing local-based representation approaches.
This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z) - An Empirical Evaluation of Federated Contextual Bandit Algorithms [27.275089644378376]
Federated learning can be done using implicit signals generated as users interact with applications of interest.
We develop variants of prominent contextual bandit algorithms from the centralized seting for the federated setting.
Our experiments reveal the surprising effectiveness of the simple and commonly used softmax in balancing the well-know exploration-exploitation tradeoff.
arXiv Detail & Related papers (2023-03-17T19:22:30Z) - Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model.
A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations.
We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z) - Top-K Ranking Deep Contextual Bandits for Information Selection Systems [0.0]
We propose a novel approach to top-K rankings under the contextual multi-armed bandit framework.
We model the reward function with a neural network to allow non-linear approximation to learn the relationship between rewards and contexts.
arXiv Detail & Related papers (2022-01-28T15:10:44Z) - Temporal Predictive Coding For Model-Based Planning In Latent Space [80.99554006174093]
We present an information-theoretic approach that employs temporal predictive coding to encode elements in the environment that can be predicted across time.
We evaluate our model on a challenging modification of standard DMControl tasks where the background is replaced with natural videos that contain complex but irrelevant information to the planning task.
arXiv Detail & Related papers (2021-06-14T04:31:15Z) - Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual
Bandits [9.877980800275507]
We propose an approach that learns to represent the relevant context for a decision based solely on the raw history of interactions between the agent and the environment.
This approach relies on a combination of features extracted by recurrent neural networks with a contextual linear bandit algorithm based on posterior sampling.
arXiv Detail & Related papers (2020-07-09T12:46:51Z) - Seismic horizon detection with neural networks [62.997667081978825]
This paper is an open-sourced research of applying binary segmentation approach to the task of horizon detection on multiple real seismic cubes with a focus on inter-cube generalization of the predictive model.
The main contribution of this paper is an open-sourced research of applying binary segmentation approach to the task of horizon detection on multiple real seismic cubes with a focus on inter-cube generalization of the predictive model.
arXiv Detail & Related papers (2020-01-10T11:30:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.