Recycling History: Efficient Recommendations from Contextual Dueling Bandits
- URL: http://arxiv.org/abs/2508.18841v1
- Date: Tue, 26 Aug 2025 09:18:13 GMT
- Title: Recycling History: Efficient Recommendations from Contextual Dueling Bandits
- Authors: Suryanarayana Sankagiri, Jalal Etesami, Pouria Fatemi, Matthias Grossglauser,
- Abstract summary: Motivated by the fact that users provide more reliable feedback after consuming items, we propose a new bandit model.<n>In our model, this comparison item can be chosen without incurring any additional regret, potentially leading to better performance.<n>We show that the algorithm can construct informative queries provided the history is rich, i.e., satisfies a certain diversity condition.
- Score: 7.802377730449526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The contextual duelling bandit problem models adaptive recommender systems, where the algorithm presents a set of items to the user, and the user's choice reveals their preference. This setup is well suited for implicit choices users make when navigating a content platform, but does not capture other possible comparison queries. Motivated by the fact that users provide more reliable feedback after consuming items, we propose a new bandit model that can be described as follows. The algorithm recommends one item per time step; after consuming that item, the user is asked to compare it with another item chosen from the user's consumption history. Importantly, in our model, this comparison item can be chosen without incurring any additional regret, potentially leading to better performance. However, the regret analysis is challenging because of the temporal dependency in the user's history. To overcome this challenge, we first show that the algorithm can construct informative queries provided the history is rich, i.e., satisfies a certain diversity condition. We then show that a short initial random exploration phase is sufficient for the algorithm to accumulate a rich history with high probability. This result, proven via matrix concentration bounds, yields $O(\sqrt{T})$ regret guarantees. Additionally, our simulations show that reusing past items for comparisons can lead to significantly lower regret than only comparing between simultaneously recommended items.
Related papers
- Preference Trajectory Modeling via Flow Matching for Sequential Recommendation [50.077447974294586]
Sequential recommendation predicts each user's next item based on their historical interaction sequence.<n>FlowRec is a simple yet effective sequential recommendation framework.<n>We construct a personalized behavior-based prior distribution to replace Gaussian noise and learn a vector field to model user preference trajectories.
arXiv Detail & Related papers (2025-08-25T02:55:42Z) - Churn-Aware Recommendation Planning under Aggregated Preference Feedback [6.261444979025644]
We study a sequential decision-making problem motivated by recent regulatory and technological shifts.<n>We introduce the Rec-APC model, in which an anonymous user is drawn from a known prior over latent user types.<n>We prove that optimal policies converge to pure exploitation in finite time and propose a branch-and-bound algorithm to efficiently compute them.
arXiv Detail & Related papers (2025-07-06T19:22:47Z) - ConsRec: Denoising Sequential Recommendation through User-Consistent Preference Modeling [33.281526528724335]
We propose the User-Consistent Preference-based Sequential Recommendation System (ConsRec)<n>ConsRec captures stable user preferences and filters noisy items from interaction histories.<n>Results show ConsRec achieves a 13% improvement over baseline recommendation models.
arXiv Detail & Related papers (2025-05-28T08:55:13Z) - Neural Dueling Bandits: Preference-Based Optimization with Human Feedback [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.<n>We also extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Misalignment, Learning, and Ranking: Harnessing Users Limited Attention [16.74322664734553]
We study the design of online algorithms that obtain vanishing regret against optimal benchmarks.
We show how standard algorithms for adversarial online linear optimization can be used to obtain a payoff-time algorithm with $O(sqrtT)$ regret.
arXiv Detail & Related papers (2024-02-21T18:52:20Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Online Recommendations for Agents with Discounted Adaptive Preferences [17.501559059079806]
bandit recommendations problem in which an agent's preferences evolve as a function of past selections.
We show an algorithm which obtains efficient sublinear regret with respect to nearly the $textitentire$ item simplex.
arXiv Detail & Related papers (2023-02-12T22:04:27Z) - Batch versus Sequential Active Learning for Recommender Systems [3.7796614675664397]
We show that sequential mode produces the most accurate recommendations for dense data sets.
For most active learners, the best predictor turned out to be FunkSVD in combination with sequential mode.
arXiv Detail & Related papers (2022-01-19T12:50:36Z) - Sequential Recommendation via Stochastic Self-Attention [68.52192964559829]
Transformer-based approaches embed items as vectors and use dot-product self-attention to measure the relationship between items.
We propose a novel textbfSTOchastic textbfSelf-textbfAttention(STOSA) to overcome these issues.
We devise a novel Wasserstein Self-Attention module to characterize item-item position-wise relationships in sequences.
arXiv Detail & Related papers (2022-01-16T12:38:45Z) - Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise
Comparisons [85.5955376526419]
In rank aggregation problems, users exhibit various accuracy levels when comparing pairs of items.
We propose an elimination-based active sampling strategy, which estimates the ranking of items via noisy pairwise comparisons.
We prove that our algorithm can return the true ranking of items with high probability.
arXiv Detail & Related papers (2021-10-08T13:51:55Z) - Regret in Online Recommendation Systems [73.58127515175127]
This paper proposes a theoretical analysis of recommendation systems in an online setting, where items are sequentially recommended to users over time.
In each round, a user, randomly picked from a population of $m$ users, requests a recommendation. The decision-maker observes the user and selects an item from a catalogue of $n$ items.
The performance of the recommendation algorithm is captured through its regret, considering as a reference an Oracle algorithm aware of these probabilities.
arXiv Detail & Related papers (2020-10-23T12:48:35Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.