Related papers: HyperBandit: Contextual Bandit with Hypernewtork for Time-Varying User Preferences in Streaming Recommendation

HyperBandit: Contextual Bandit with Hypernewtork for Time-Varying User Preferences in Streaming Recommendation

URL: http://arxiv.org/abs/2308.08497v1
Date: Mon, 14 Aug 2023 14:04:57 GMT
Title: HyperBandit: Contextual Bandit with Hypernewtork for Time-Varying User Preferences in Streaming Recommendation
Authors: Chenglei Shen, Xiao Zhang, Wei Wei, Jun Xu
Abstract summary: Existing streaming recommender models only consider time as a timestamp. We propose a contextual bandit approach using hypernetwork, called HyperBandit. We show that the proposed HyperBandit consistently outperforms the state-of-the-art baselines in terms of accumulated rewards.
Score: 11.908362247624131
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In real-world streaming recommender systems, user preferences often dynamically change over time (e.g., a user may have different preferences during weekdays and weekends). Existing bandit-based streaming recommendation models only consider time as a timestamp, without explicitly modeling the relationship between time variables and time-varying user preferences. This leads to recommendation models that cannot quickly adapt to dynamic scenarios. To address this issue, we propose a contextual bandit approach using hypernetwork, called HyperBandit, which takes time features as input and dynamically adjusts the recommendation model for time-varying user preferences. Specifically, HyperBandit maintains a neural network capable of generating the parameters for estimating time-varying rewards, taking into account the correlation between time features and user preferences. Using the estimated time-varying rewards, a bandit policy is employed to make online recommendations by learning the latent item contexts. To meet the real-time requirements in streaming recommendation scenarios, we have verified the existence of a low-rank structure in the parameter matrix and utilize low-rank factorization for efficient training. Theoretically, we demonstrate a sublinear regret upper bound against the best policy. Extensive experiments on real-world datasets show that the proposed HyperBandit consistently outperforms the state-of-the-art baselines in terms of accumulated rewards.

Related papers

Test-Time Alignment for Tracking User Interest Shifts in Sequential Recommendation [47.827361176767944]
Sequential recommendation is essential in modern recommender systems, aiming to predict the next item a user may interact with. Real-world scenarios are often dynamic and subject to shifts in user interests. Recent Test-Time Training has emerged as a promising paradigm, enabling pre-trained models to dynamically adapt to test data. We propose T$2$ARec, a novel model leveraging state space model for TTT by introducing two Test-Time Alignment modules tailored for sequential recommendation.
arXiv Detail & Related papers (2025-04-02T08:42:30Z)
Sequential Recommendation on Temporal Proximities with Contrastive Learning and Self-Attention [3.7182810519704095]
Sequential recommender systems identify user preferences from their past interactions to predict subsequent items optimally. Recent models often neglect similarities in users' actions that occur implicitly among users during analogous timeframes. We propose a sequential recommendation model called TemProxRec, which includes contrastive learning and self-attention methods to consider temporal proximities.
arXiv Detail & Related papers (2024-02-15T08:33:16Z)
Attention Mixtures for Time-Aware Sequential Recommendation [10.017195276758454]
Transformers emerged as powerful methods for sequential recommendation. We introduce MOJITO, an improved Transformer sequential recommender system. We demonstrate the relevance of our approach, by empirically outperforming existing Transformers for sequential recommendation on several real-world datasets.
arXiv Detail & Related papers (2023-04-17T11:11:19Z)
Online Continuous Hyperparameter Optimization for Generalized Linear Contextual Bandits [55.03293214439741]
In contextual bandits, an agent sequentially makes actions from a time-dependent action set based on past experience. We propose the first online continuous hyperparameter tuning framework for contextual bandits. We show that it could achieve a sublinear regret in theory and performs consistently better than all existing methods on both synthetic and real datasets.
arXiv Detail & Related papers (2023-02-18T23:31:20Z)
Time-aware Hyperbolic Graph Attention Network for Session-based Recommendation [58.748215444851226]
Session-based Recommendation (SBR) is to predict users' next interested items based on their previous browsing sessions. We propose Time-aware Hyperbolic Graph Attention Network (TA-HGAT) to build a session-based recommendation model considering temporal information.
arXiv Detail & Related papers (2023-01-10T04:16:09Z)
Latent User Intent Modeling for Sequential Recommenders [92.66888409973495]
Sequential recommender models learn to predict the next items a user is likely to interact with based on his/her interaction history on the platform. Most sequential recommenders however lack a higher-level understanding of user intents, which often drive user behaviors online. Intent modeling is thus critical for understanding users and optimizing long-term user experience.
arXiv Detail & Related papers (2022-11-17T19:00:24Z)
Modeling Dynamic User Preference via Dictionary Learning for Sequential Recommendation [133.8758914874593]
Capturing the dynamics in user preference is crucial to better predict user future behaviors because user preferences often drift over time. Many existing recommendation algorithms -- including both shallow and deep ones -- often model such dynamics independently. This paper considers the problem of embedding a user's sequential behavior into the latent space of user preferences.
arXiv Detail & Related papers (2022-04-02T03:23:46Z)
Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms [74.55200180156906]
The contextual bandit problem models the trade-off between exploration and exploitation. We show our Syndicated Bandits framework can achieve the optimal regret upper bounds.
arXiv Detail & Related papers (2021-06-05T22:30:21Z)
Learning Heterogeneous Temporal Patterns of User Preference for Timely Recommendation [15.930016839929047]
We propose a novel recommender system for timely recommendations, called TimelyRec. In TimelyRec, a cascade of two encoders captures the temporal patterns of user preference using a proposed attention module for each encoder. Our experiments on a scenario for item recommendation and the proposed scenario for item-timing recommendation on real-world datasets demonstrate the superiority of TimelyRec.
arXiv Detail & Related papers (2021-04-29T08:37:30Z)
Non-Stationary Latent Bandits [68.21614490603758]
We propose a practical approach for fast personalization to non-stationary users. The key idea is to frame this problem as a latent bandit, where prototypical models of user behavior are learned offline and the latent state of the user is inferred online. We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset.
arXiv Detail & Related papers (2020-12-01T10:31:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.