Bounded (O(1)) Regret Recommendation Learning via Synthetic Controls
Oracle
- URL: http://arxiv.org/abs/2301.12571v2
- Date: Thu, 29 Jun 2023 22:13:51 GMT
- Title: Bounded (O(1)) Regret Recommendation Learning via Synthetic Controls
Oracle
- Authors: Enoch Hyunwook Kang, P. R. Kumar
- Abstract summary: In online exploration systems where users with fixed preferences repeatedly arrive, bounded regret can be achieved.
This result may be of interest for recommender systems, where the popularity of their items is often short-lived.
In this work, we conduct a theoretical study to address all these issues while still achieving bounded regret.
- Score: 6.935657546323529
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In online exploration systems where users with fixed preferences repeatedly
arrive, it has recently been shown that O(1), i.e., bounded regret, can be
achieved when the system is modeled as a linear contextual bandit. This result
may be of interest for recommender systems, where the popularity of their items
is often short-lived, as the exploration itself may be completed quickly before
potential long-run non-stationarities come into play. However, in practice,
exact knowledge of the linear model is difficult to justify. Furthermore,
potential existence of unobservable covariates, uneven user arrival rates,
interpretation of the necessary rank condition, and users opting out of private
data tracking all need to be addressed for practical recommender system
applications. In this work, we conduct a theoretical study to address all these
issues while still achieving bounded regret. Aside from proof techniques, the
key differentiating assumption we make here is the presence of effective
Synthetic Control Methods (SCM), which are shown to be a practical relaxation
of the exact linear model knowledge assumption. We verify our theoretical
bounded regret result using a minimal simulation experiment.
Related papers
- Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States [52.56827348431552]
gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data.
This paper theoretically studies the implicit bias of policy gradient in terms of extrapolation to unseen initial states.
arXiv Detail & Related papers (2024-02-12T18:41:31Z) - Can Learning Deteriorate Control? Analyzing Computational Delays in
Gaussian Process-Based Event-Triggered Online Learning [7.697964930378468]
We propose a novel event trigger for GP-based online learning with computational delays.
We show to offer advantages over offline trained GP models for sufficiently small computation times.
arXiv Detail & Related papers (2023-05-14T14:37:33Z) - Effective and Efficient Training for Sequential Recommendation using
Recency Sampling [91.02268704681124]
We propose a novel Recency-based Sampling of Sequences training objective.
We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec.
arXiv Detail & Related papers (2022-07-06T13:06:31Z) - Model-based Offline Imitation Learning with Non-expert Data [7.615595533111191]
We propose a scalable model-based offline imitation learning algorithmic framework that leverages datasets collected by both suboptimal and optimal policies.
We show that the proposed method textitalways outperforms Behavioral Cloning in the low data regime on simulated continuous control domains.
arXiv Detail & Related papers (2022-06-11T13:08:08Z) - Context Uncertainty in Contextual Bandits with Applications to
Recommender Systems [16.597836265345634]
We propose a new type of recurrent neural networks, dubbed recurrent exploration networks (REN), to jointly perform representation learning and effective exploration in the latent space.
Our theoretical analysis shows that REN can preserve the rate-linear suboptimal regret even when there exists uncertainty in the learned representations.
Our empirical study demonstrates that REN can achieve satisfactory long-term rewards on both synthetic and real-world recommendation datasets, outperforming state-of-the-art models.
arXiv Detail & Related papers (2022-02-01T23:23:50Z) - Supervised Advantage Actor-Critic for Recommender Systems [76.7066594130961]
We propose negative sampling strategy for training the RL component and combine it with supervised sequential learning.
Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case.
We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets.
arXiv Detail & Related papers (2021-11-05T12:51:15Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - On the Theory of Reinforcement Learning with Once-per-Episode Feedback [120.5537226120512]
We introduce a theory of reinforcement learning in which the learner receives feedback only once at the end of an episode.
This is arguably more representative of real-world applications than the traditional requirement that the learner receive feedback at every time step.
arXiv Detail & Related papers (2021-05-29T19:48:51Z) - Modeling Online Behavior in Recommender Systems: The Importance of
Temporal Context [30.894950420437926]
We show how omitting temporal context when evaluating recommender system performance leads to false confidence.
We propose a training procedure to further embed the temporal context in existing models.
Results show that including our temporal objective can improve recall@20 by up to 20%.
arXiv Detail & Related papers (2020-09-19T19:36:43Z) - Learning the Truth From Only One Side of the Story [58.65439277460011]
We focus on generalized linear models and show that without adjusting for this sampling bias, the model may converge suboptimally or even fail to converge to the optimal solution.
We propose an adaptive approach that comes with theoretical guarantees and show that it outperforms several existing methods empirically.
arXiv Detail & Related papers (2020-06-08T18:20:28Z) - Technical Report: Adaptive Control for Linearizable Systems Using
On-Policy Reinforcement Learning [41.24484153212002]
This paper proposes a framework for adaptively learning a feedback linearization-based tracking controller for an unknown system.
It does not require the learned inverse model to be invertible at all instances of time.
A simulated example of a double pendulum demonstrates the utility of the proposed theory.
arXiv Detail & Related papers (2020-04-06T15:50:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.