Related papers: Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits

Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits

URL: http://arxiv.org/abs/2007.04750v2
Date: Fri, 3 Nov 2023 11:12:12 GMT
Title: Recurrent Neural-Linear Posterior Sampling for Nonstationary Contextual Bandits
Authors: Aditya Ramesh, Paulo Rauber, Michelangelo Conserva, J\"urgen Schmidhuber
Abstract summary: We propose an approach that learns to represent the relevant context for a decision based solely on the raw history of interactions between the agent and the environment. This approach relies on a combination of features extracted by recurrent neural networks with a contextual linear bandit algorithm based on posterior sampling.
Score: 9.877980800275507
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: An agent in a nonstationary contextual bandit problem should balance between exploration and the exploitation of (periodic or structured) patterns present in its previous experiences. Handcrafting an appropriate historical context is an attractive alternative to transform a nonstationary problem into a stationary problem that can be solved efficiently. However, even a carefully designed historical context may introduce spurious relationships or lack a convenient representation of crucial information. In order to address these issues, we propose an approach that learns to represent the relevant context for a decision based solely on the raw history of interactions between the agent and the environment. This approach relies on a combination of features extracted by recurrent neural networks with a contextual linear bandit algorithm based on posterior sampling. Our experiments on a diverse selection of contextual and noncontextual nonstationary problems show that our recurrent approach consistently outperforms its feedforward counterpart, which requires handcrafted historical contexts, while being more widely applicable than conventional nonstationary bandit algorithms. Although it is very difficult to provide theoretical performance guarantees for our new approach, we also prove a novel regret bound for linear posterior sampling with measurement error that may serve as a foundation for future theoretical work.

Related papers

Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Neural Contextual Bandits for Personalized Recommendation [49.85090929163639]
This tutorial investigates the contextual bandits as a powerful framework for personalized recommendations. We focus on the exploration perspective of contextual bandits to alleviate the Matthew Effect'' in recommender systems. In addition to the conventional linear contextual bandits, we will also dedicated to neural contextual bandits.
arXiv Detail & Related papers (2023-12-21T17:03:26Z)
Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling [15.88678122212934]
Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends. We introduce a novel non-stationary contextual bandit algorithm that addresses these concerns. It combines a scalable, deep-neural-network-based architecture with a carefully designed exploration mechanism.
arXiv Detail & Related papers (2023-10-11T18:15:55Z)
Online learning in bandits with predicted context [8.257280652461159]
We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context. This setting is motivated by a wide range of applications where the true context for decision-making is unobserved. We propose the first online algorithm in this setting with sublinear regret guarantees under mild conditions.
arXiv Detail & Related papers (2023-07-26T02:33:54Z)
Federated Learning for Heterogeneous Bandits with Unobserved Contexts [0.0]
We study the problem of federated multi-arm contextual bandits with unknown contexts. We propose an elimination-based algorithm and prove the regret bound for linearly parametrized reward functions.
arXiv Detail & Related papers (2023-03-29T22:06:24Z)
A Unified Framework of Policy Learning for Contextual Bandit with Confounding Bias and Missing Observations [108.89353070722497]
We study the offline contextual bandit problem, where we aim to acquire an optimal policy using observational data. We present a new algorithm called Causal-Adjusted Pessimistic (CAP) policy learning, which forms the reward function as the solution of an integral equation system.
arXiv Detail & Related papers (2023-03-20T15:17:31Z)
From Contextual Data to Newsvendor Decisions: On the Actual Performance of Data-Driven Algorithms [2.9603743540540357]
We study how the relevance and quantity of past data affects the performance of a data-driven policy. We consider a setting in which past demands observed under close by'' contexts come from close by distributions.
arXiv Detail & Related papers (2023-02-16T17:03:39Z)
Hypothesis Transfer in Bandits by Weighted Models [8.759884299087835]
We consider the problem of contextual multi-armed bandits in the setting of hypothesis transfer learning. We show a re-weighting scheme for which we show a reduction in the regret over the classic Linear UCB when transfer is desired. We further extend this method to an arbitrary amount of source models, where the algorithm decides which model is preferred at each time step.
arXiv Detail & Related papers (2022-11-14T14:13:02Z)
Post-Contextual-Bandit Inference [57.88785630755165]
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-commerce, healthcare, and policymaking. They can both improve outcomes for study participants and increase the chance of identifying good or even best policies. To support credible inference on novel interventions at the end of the study, we still want to construct valid confidence intervals on average treatment effects, subgroup effects, or value of new policies.
arXiv Detail & Related papers (2021-06-01T12:01:51Z)
Parallelizing Contextual Linear Bandits [82.65675585004448]
We present a family of (parallel) contextual linear bandit algorithms, whose regret is nearly identical to their perfectly sequential counterparts. We also present an empirical evaluation of these parallel algorithms in several domains, including materials discovery and biological sequence design problems.
arXiv Detail & Related papers (2021-05-21T22:22:02Z)
Differentiable Causal Discovery from Interventional Data [141.41931444927184]
We propose a theoretically-grounded method based on neural networks that can leverage interventional data. We show that our approach compares favorably to the state of the art in a variety of settings.
arXiv Detail & Related papers (2020-07-03T15:19:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.