Contextual-Bandit Based Personalized Recommendation with Time-Varying
User Interests
- URL: http://arxiv.org/abs/2003.00359v1
- Date: Sat, 29 Feb 2020 22:59:15 GMT
- Title: Contextual-Bandit Based Personalized Recommendation with Time-Varying
User Interests
- Authors: Xiao Xu, Fang Dong, Yanghua Li, Shaojian He, Xin Li
- Abstract summary: A contextual bandit problem is studied in a non-stationary environment.
Users' preferences towards different items vary differently over time.
An efficient learning algorithm that is adaptive to abrupt reward changes is proposed.
- Score: 11.248271996987802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A contextual bandit problem is studied in a highly non-stationary
environment, which is ubiquitous in various recommender systems due to the
time-varying interests of users. Two models with disjoint and hybrid payoffs
are considered to characterize the phenomenon that users' preferences towards
different items vary differently over time. In the disjoint payoff model, the
reward of playing an arm is determined by an arm-specific preference vector,
which is piecewise-stationary with asynchronous and distinct changes across
different arms. An efficient learning algorithm that is adaptive to abrupt
reward changes is proposed and theoretical regret analysis is provided to show
that a sublinear scaling of regret in the time length $T$ is achieved. The
algorithm is further extended to a more general setting with hybrid payoffs
where the reward of playing an arm is determined by both an arm-specific
preference vector and a joint coefficient vector shared by all arms. Empirical
experiments are conducted on real-world datasets to verify the advantages of
the proposed learning algorithms against baseline ones in both settings.
Related papers
- Preference-centric Bandits: Optimality of Mixtures and Regret-efficient Algorithms [34.876652087068734]
This paper introduces a framework for shifting from expectation-based evaluation to an alternative reward formulation, termed a preference metric (PM)
The PMs can place the desired emphasis on different reward realization and can encode a richer modeling of preferences that incorporate risk aversion, robustness, or other desired attitudes toward uncertainty.
The paper formalizes the PM-centric framework and presents two algorithm classes that learn and track mixtures in a regret-efficient fashion.
arXiv Detail & Related papers (2025-04-29T15:46:59Z) - Influential Bandits: Pulling an Arm May Change the Environment [44.71145269686588]
Real-world applications often involve non-stationary environments and interdependencies between arms.
We propose the influential bandit problem, which models inter-arm interactions through an unknown, symmetric, positive semi-definite interaction matrix.
We introduce a new algorithm based on a lower confidence bound (LCB) estimator tailored to the structure of the loss dynamics.
arXiv Detail & Related papers (2025-04-11T02:05:51Z) - Semi-Parametric Batched Global Multi-Armed Bandits with Covariates [0.48342038441006807]
The multi-armed bandits (MAB) framework is a widely used approach for sequential decision-making.
We propose a novel semi-parametric framework for batched bandits with coparametrics and a shared parameter across arms.
Our algorithm, Batched single-Index Dynamic binning and Successive arm elimination (BIDS), employs a batched successive arm elimination strategy.
arXiv Detail & Related papers (2025-03-01T17:23:55Z) - Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback.
We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z) - Contextual Bandits with Arm Request Costs and Delays [19.263086804406786]
We introduce a novel extension of the contextual bandit problem, where new sets of arms can be requested with time delays and associated costs.
In this setting, the learner can select multiple arms from a decision set, with each selection taking one unit of time.
We design algorithms that can effectively select arms and determine the appropriate time to request new arms, thereby minimizing their regret.
arXiv Detail & Related papers (2024-10-17T00:44:50Z) - Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Piecewise-Stationary Combinatorial Semi-Bandit with Causally Related
Rewards [5.347237827669861]
We study the piecewise stationary semi-bandit problem with causally related rewards.
In our nonstationary environment, variations in the base arms' distributions, causal relationships between rewards, or both, change the reward generation process.
The problem becomes aggravated in the semi-bandit setting, where the decision-maker only observes the outcome of the selected bundle of arms.
arXiv Detail & Related papers (2023-07-26T12:06:13Z) - Stability-penalty-adaptive follow-the-regularized-leader: Sparsity,
game-dependency, and best-of-both-worlds [46.30750729936261]
Follow-the-regularized-leader (FTRL) has recently emerged as one of the most promising approaches for obtaining various types of adaptivity in bandit problems.
We establish several algorithms with three types of adaptivity: sparsity, game-dependency, and best-of-both-worlds (BOBW)
arXiv Detail & Related papers (2023-05-26T23:20:48Z) - Piecewise-Stationary Multi-Objective Multi-Armed Bandit with Application
to Joint Communications and Sensing [7.0997346625024]
We propose a generic upper confidence bound (UCB)-based algorithm with change detection to solve this problem.
We also formulate an energy-efficient waveform design problem in an integrated communication and sensing system as a toy example.
arXiv Detail & Related papers (2023-02-10T14:10:14Z) - Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance.
Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets.
Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z) - Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased.
We propose a novel approach for dueling bandits based on information-directed sampling (IDS)
Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z) - SetRank: A Setwise Bayesian Approach for Collaborative Ranking from
Implicit Feedback [50.13745601531148]
We propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to accommodate the characteristics of implicit feedback in recommender system.
Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons.
We also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $sqrtM/N$.
arXiv Detail & Related papers (2020-02-23T06:40:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.