Related papers: BPL: Bias-adaptive Preference Distillation Learning for Recommender System

BPL: Bias-adaptive Preference Distillation Learning for Recommender System

URL: http://arxiv.org/abs/2510.16076v1
Date: Fri, 17 Oct 2025 11:09:04 GMT
Title: BPL: Bias-adaptive Preference Distillation Learning for Recommender System
Authors: SeongKu Kang, Jianxun Lian, Dongha Lee, Wonbin Kweon, Sanghwan Jang, Jaehyun Lee, Jindong Wang, Xing Xie, Hwanjo Yu,
Abstract summary: We introduce a new learning framework, called Bias-adaptive Preference distillation Learning (BPL), to gradually uncover user preferences.<n>BPL retains accurate preference knowledge aligned with the collected feedback, leading to high performance in the factual test.<n>Through self-distillation with reliability filtering, BPL iteratively refines its knowledge throughout the training process.
Score: 61.916973366625285
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recommender systems suffer from biases that cause the collected feedback to incompletely reveal user preference. While debiasing learning has been extensively studied, they mostly focused on the specialized (called counterfactual) test environment simulated by random exposure of items, significantly degrading accuracy in the typical (called factual) test environment based on actual user-item interactions. In fact, each test environment highlights the benefit of a different aspect: the counterfactual test emphasizes user satisfaction in the long-terms, while the factual test focuses on predicting subsequent user behaviors on platforms. Therefore, it is desirable to have a model that performs well on both tests rather than only one. In this work, we introduce a new learning framework, called Bias-adaptive Preference distillation Learning (BPL), to gradually uncover user preferences with dual distillation strategies. These distillation strategies are designed to drive high performance in both factual and counterfactual test environments. Employing a specialized form of teacher-student distillation from a biased model, BPL retains accurate preference knowledge aligned with the collected feedback, leading to high performance in the factual test. Furthermore, through self-distillation with reliability filtering, BPL iteratively refines its knowledge throughout the training process. This enables the model to produce more accurate predictions across a broader range of user-item combinations, thereby improving performance in the counterfactual test. Comprehensive experiments validate the effectiveness of BPL in both factual and counterfactual tests. Our implementation is accessible via: https://github.com/SeongKu-Kang/BPL.

Related papers

Selective Mixup for Debiasing Question Selection in Computerized Adaptive Testing [50.805231979748434]
Computerized Adaptive Testing (CAT) is a widely used technology for evaluating learners' proficiency in online education platforms.<n> Selection Bias arises because the question selection is strongly influenced by the estimated proficiency.<n>We propose a debiasing framework consisting of two key modules: Cross-Attribute Examinee Retrieval and Selective Mixup-based Regularization.
arXiv Detail & Related papers (2025-11-19T08:55:01Z)
A Systematic Examination of Preference Learning through the Lens of Instruction-Following [83.71180850955679]
We use a novel synthetic data generation pipeline to generate 48,000 instruction unique-following prompts.<n>With our synthetic prompts, we use two preference dataset curation methods - rejection sampling (RS) and Monte Carlo Tree Search (MCTS)<n>Experiments reveal that shared prefixes in preference pairs, as generated by MCTS, provide marginal but consistent improvements.<n>High-contrast preference pairs generally outperform low-contrast pairs; however, combining both often yields the best performance.
arXiv Detail & Related papers (2024-12-18T15:38:39Z)
Dual Test-time Training for Out-of-distribution Recommender System [91.15209066874694]
We propose a novel Dual Test-Time-Training framework for OOD Recommendation, termed DT3OR.<n>In DT3OR, we incorporate a model adaptation mechanism during the test-time phase to carefully update the recommendation model.<n>To the best of our knowledge, this paper is the first work to address OOD recommendation via a test-time-training strategy.
arXiv Detail & Related papers (2024-07-22T13:27:51Z)
BaFTA: Backprop-Free Test-Time Adaptation For Zero-Shot Vision-Language Models [20.88680592729709]
We propose a novel backpropagation-free algorithm BaFTA for test-time adaptation of vision-language models. BaFTA directly estimates class centroids using online clustering within a projected embedding space. We demonstrate that BaFTA consistently outperforms state-of-the-art test-time adaptation methods in both effectiveness and efficiency.
arXiv Detail & Related papers (2024-06-17T08:16:24Z)
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction [56.17020601803071]
Recent research shows that pre-trained language models (PLMs) suffer from "prompt bias" in factual knowledge extraction. This paper aims to improve the reliability of existing benchmarks by thoroughly investigating and mitigating prompt bias.
arXiv Detail & Related papers (2024-03-15T02:04:35Z)
TESSERACT: Eliminating Experimental Bias in Malware Classification across Space and Time (Extended Version) [17.397081564454293]
Malware detectors often experience performance decay due to constantly evolving operating systems and attack methods.<n>This paper argues that commonly reported results are inflated due to two pervasive sources of experimental bias in the detection task.
arXiv Detail & Related papers (2024-02-02T12:27:32Z)
DELTA: degradation-free fully test-time adaptation [59.74287982885375]
We find that two unfavorable defects are concealed in the prevalent adaptation methodologies like test-time batch normalization (BN) and self-learning. First, we reveal that the normalization statistics in test-time BN are completely affected by the currently received test samples, resulting in inaccurate estimates. Second, we show that during test-time adaptation, the parameter update is biased towards some dominant classes.
arXiv Detail & Related papers (2023-01-30T15:54:00Z)
Characterizing the robustness of Bayesian adaptive experimental designs to active learning bias [3.1351527202068445]
We show that active learning bias can afflict Bayesian adaptive experimental design, depending on model misspecification. We develop an information-theoretic measure of misspecification, and show that worse misspecification implies more severe active learning bias.
arXiv Detail & Related papers (2022-05-27T01:23:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.