Aligning GPTRec with Beyond-Accuracy Goals with Reinforcement Learning
- URL: http://arxiv.org/abs/2403.04875v1
- Date: Thu, 7 Mar 2024 19:47:48 GMT
- Title: Aligning GPTRec with Beyond-Accuracy Goals with Reinforcement Learning
- Authors: Aleksandr Petrov and Craig Macdonald
- Abstract summary: GPTRec is an alternative to the Top-K model for item-by-item recommendations.
We show that GPTRec offers a better tradeoff between accuracy and secondary metrics than classic greedy re-ranking techniques.
Our experiments on two datasets show that GPTRec's Next-K generation approach offers a better tradeoff between accuracy and secondary metrics than classic greedy re-ranking techniques.
- Score: 67.71952251641545
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adaptations of Transformer models, such as BERT4Rec and SASRec, achieve
state-of-the-art performance in the sequential recommendation task according to
accuracy-based metrics, such as NDCG. These models treat items as tokens and
then utilise a score-and-rank approach (Top-K strategy), where the model first
computes item scores and then ranks them according to this score. While this
approach works well for accuracy-based metrics, it is hard to use it for
optimising more complex beyond-accuracy metrics such as diversity. Recently,
the GPTRec model, which uses a different Next-K strategy, has been proposed as
an alternative to the Top-K models. In contrast with traditional Top-K
recommendations, Next-K generates recommendations item-by-item and, therefore,
can account for complex item-to-item interdependencies important for the
beyond-accuracy measures. However, the original GPTRec paper focused only on
accuracy in experiments and needed to address how to optimise the model for
complex beyond-accuracy metrics. Indeed, training GPTRec for beyond-accuracy
goals is challenging because the interaction training data available for
training recommender systems typically needs to be aligned with beyond-accuracy
recommendation goals. To solve the misalignment problem, we train GPTRec using
a 2-stage approach: in the first stage, we use a teacher-student approach to
train GPTRec, mimicking the behaviour of traditional Top-K models; in the
second stage, we use Reinforcement Learning to align the model for
beyond-accuracy goals. In particular, we experiment with increasing
recommendation diversity and reducing popularity bias. Our experiments on two
datasets show that in 3 out of 4 cases, GPTRec's Next-K generation approach
offers a better tradeoff between accuracy and secondary metrics than classic
greedy re-ranking techniques.
Related papers
- Toward Theoretical Guidance for Two Common Questions in Practical
Cross-Validation based Hyperparameter Selection [72.76113104079678]
We show the first theoretical treatments of two common questions in cross-validation based hyperparameter selection.
We show that these generalizations can, respectively, always perform at least as well as always performing retraining or never performing retraining.
arXiv Detail & Related papers (2023-01-12T16:37:12Z) - Deep Active Ensemble Sampling For Image Classification [8.31483061185317]
Active learning frameworks aim to reduce the cost of data annotation by actively requesting the labeling for the most informative data points.
Some proposed approaches include uncertainty-based techniques, geometric methods, implicit combination of uncertainty-based and geometric approaches.
We present an innovative integration of recent progress in both uncertainty-based and geometric frameworks to enable an efficient exploration/exploitation trade-off in sample selection strategy.
Our framework provides two advantages: (1) accurate posterior estimation, and (2) tune-able trade-off between computational overhead and higher accuracy.
arXiv Detail & Related papers (2022-10-11T20:20:20Z) - GROOT: Corrective Reward Optimization for Generative Sequential Labeling [10.306943706927004]
We propose GROOT -- a framework for Generative Reward Optimization Of Text sequences.
GROOT works by training a generative sequential labeling model to match the decoder output distribution with that of the (black-box) reward function.
As demonstrated via extensive experiments on four public benchmarks, GROOT significantly improves all reward metrics.
arXiv Detail & Related papers (2022-09-29T11:35:47Z) - Effective and Efficient Training for Sequential Recommendation using
Recency Sampling [91.02268704681124]
We propose a novel Recency-based Sampling of Sequences training objective.
We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec.
arXiv Detail & Related papers (2022-07-06T13:06:31Z) - Recommendation Systems with Distribution-Free Reliability Guarantees [83.80644194980042]
We show how to return a set of items rigorously guaranteed to contain mostly good items.
Our procedure endows any ranking model with rigorous finite-sample control of the false discovery rate.
We evaluate our methods on the Yahoo! Learning to Rank and MSMarco datasets.
arXiv Detail & Related papers (2022-07-04T17:49:25Z) - Top-N Recommendation with Counterfactual User Preference Simulation [26.597102553608348]
Top-N recommendation, which aims to learn user ranking-based preference, has long been a fundamental problem in a wide range of applications.
In this paper, we propose to reformulate the recommendation task within the causal inference framework to handle the data scarce problem.
arXiv Detail & Related papers (2021-09-02T14:28:46Z) - Adaptive Consistency Regularization for Semi-Supervised Transfer
Learning [31.66745229673066]
We consider semi-supervised learning and transfer learning jointly, leading to a more practical and competitive paradigm.
To better exploit the value of both pre-trained weights and unlabeled target examples, we introduce adaptive consistency regularization.
Our proposed adaptive consistency regularization outperforms state-of-the-art semi-supervised learning techniques such as Pseudo Label, Mean Teacher, and MixMatch.
arXiv Detail & Related papers (2021-03-03T05:46:39Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.