Related papers: Bounded (O(1)) Regret Recommendation Learning via Synthetic Controls Oracle

Bounded (O(1)) Regret Recommendation Learning via Synthetic Controls Oracle

URL: http://arxiv.org/abs/2301.12571v2
Date: Thu, 29 Jun 2023 22:13:51 GMT
Title: Bounded (O(1)) Regret Recommendation Learning via Synthetic Controls Oracle
Authors: Enoch Hyunwook Kang, P. R. Kumar
Abstract summary: In online exploration systems where users with fixed preferences repeatedly arrive, bounded regret can be achieved. This result may be of interest for recommender systems, where the popularity of their items is often short-lived. In this work, we conduct a theoretical study to address all these issues while still achieving bounded regret.
Score: 6.935657546323529
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In online exploration systems where users with fixed preferences repeatedly arrive, it has recently been shown that O(1), i.e., bounded regret, can be achieved when the system is modeled as a linear contextual bandit. This result may be of interest for recommender systems, where the popularity of their items is often short-lived, as the exploration itself may be completed quickly before potential long-run non-stationarities come into play. However, in practice, exact knowledge of the linear model is difficult to justify. Furthermore, potential existence of unobservable covariates, uneven user arrival rates, interpretation of the necessary rank condition, and users opting out of private data tracking all need to be addressed for practical recommender system applications. In this work, we conduct a theoretical study to address all these issues while still achieving bounded regret. Aside from proof techniques, the key differentiating assumption we make here is the presence of effective Synthetic Control Methods (SCM), which are shown to be a practical relaxation of the exact linear model knowledge assumption. We verify our theoretical bounded regret result using a minimal simulation experiment.

Related papers

Data-Driven Exploration for a Class of Continuous-Time Indefinite Linear--Quadratic Reinforcement Learning Problems [6.859965454961918]
We study reinforcement learning for continuous-time linear-quadratic (LQ) control problems.<n>We propose a model-free, data-driven exploration mechanism that adaptively adjusts entropy regularization by the critic.<n>Our method achieves a sublinear regret bound that matches the best-known model-free results for this class of LQ problems.
arXiv Detail & Related papers (2025-07-01T01:09:06Z)
Probably Approximately Precision and Recall Learning [62.912015491907994]
Precision and Recall are foundational metrics in machine learning. One-sided feedback--where only positive examples are observed during training--is inherent in many practical problems. We introduce a PAC learning framework where each hypothesis is represented by a graph, with edges indicating positive interactions.
arXiv Detail & Related papers (2024-11-20T04:21:07Z)
Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States [52.56827348431552]
gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data. This paper theoretically studies the implicit bias of policy gradient in terms of extrapolation to unseen initial states.
arXiv Detail & Related papers (2024-02-12T18:41:31Z)
Can Learning Deteriorate Control? Analyzing Computational Delays in Gaussian Process-Based Event-Triggered Online Learning [7.697964930378468]
We propose a novel event trigger for GP-based online learning with computational delays. We show to offer advantages over offline trained GP models for sufficiently small computation times.
arXiv Detail & Related papers (2023-05-14T14:37:33Z)
Effective and Efficient Training for Sequential Recommendation using Recency Sampling [91.02268704681124]
We propose a novel Recency-based Sampling of Sequences training objective. We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec.
arXiv Detail & Related papers (2022-07-06T13:06:31Z)
Model-based Offline Imitation Learning with Non-expert Data [7.615595533111191]
We propose a scalable model-based offline imitation learning algorithmic framework that leverages datasets collected by both suboptimal and optimal policies. We show that the proposed method textitalways outperforms Behavioral Cloning in the low data regime on simulated continuous control domains.
arXiv Detail & Related papers (2022-06-11T13:08:08Z)
Context Uncertainty in Contextual Bandits with Applications to Recommender Systems [16.597836265345634]
We propose a new type of recurrent neural networks, dubbed recurrent exploration networks (REN), to jointly perform representation learning and effective exploration in the latent space. Our theoretical analysis shows that REN can preserve the rate-linear suboptimal regret even when there exists uncertainty in the learned representations. Our empirical study demonstrates that REN can achieve satisfactory long-term rewards on both synthetic and real-world recommendation datasets, outperforming state-of-the-art models.
arXiv Detail & Related papers (2022-02-01T23:23:50Z)
Supervised Advantage Actor-Critic for Recommender Systems [76.7066594130961]
We propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case. We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets.
arXiv Detail & Related papers (2021-11-05T12:51:15Z)
On the Theory of Reinforcement Learning with Once-per-Episode Feedback [120.5537226120512]
We introduce a theory of reinforcement learning in which the learner receives feedback only once at the end of an episode. This is arguably more representative of real-world applications than the traditional requirement that the learner receive feedback at every time step.
arXiv Detail & Related papers (2021-05-29T19:48:51Z)
Modeling Online Behavior in Recommender Systems: The Importance of Temporal Context [30.894950420437926]
We show how omitting temporal context when evaluating recommender system performance leads to false confidence. We propose a training procedure to further embed the temporal context in existing models. Results show that including our temporal objective can improve recall@20 by up to 20%.
arXiv Detail & Related papers (2020-09-19T19:36:43Z)
Learning the Truth From Only One Side of the Story [58.65439277460011]
We focus on generalized linear models and show that without adjusting for this sampling bias, the model may converge suboptimally or even fail to converge to the optimal solution. We propose an adaptive approach that comes with theoretical guarantees and show that it outperforms several existing methods empirically.
arXiv Detail & Related papers (2020-06-08T18:20:28Z)
Technical Report: Adaptive Control for Linearizable Systems Using On-Policy Reinforcement Learning [41.24484153212002]
This paper proposes a framework for adaptively learning a feedback linearization-based tracking controller for an unknown system. It does not require the learned inverse model to be invertible at all instances of time. A simulated example of a double pendulum demonstrates the utility of the proposed theory.
arXiv Detail & Related papers (2020-04-06T15:50:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.