Do Not Wait: Learning Re-Ranking Model Without User Feedback At Serving Time in E-Commerce
- URL: http://arxiv.org/abs/2406.14004v1
- Date: Thu, 20 Jun 2024 05:15:48 GMT
- Title: Do Not Wait: Learning Re-Ranking Model Without User Feedback At Serving Time in E-Commerce
- Authors: Yuan Wang, Zhiyu Li, Changshuo Zhang, Sirui Chen, Xiao Zhang, Jun Xu, Quan Lin,
- Abstract summary: We propose a novel extension of online learning methods for re-ranking modeling, which we term LAST.
It circumvents the requirement of user feedback by using a surrogate model to provide the instructional signal needed to steer model improvement.
LAST can be seamlessly integrated into existing online learning systems to create a more adaptive and responsive recommendation experience.
- Score: 16.316227411757797
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recommender systems have been widely used in e-commerce, and re-ranking models are playing an increasingly significant role in the domain, which leverages the inter-item influence and determines the final recommendation lists. Online learning methods keep updating a deployed model with the latest available samples to capture the shifting of the underlying data distribution in e-commerce. However, they depend on the availability of real user feedback, which may be delayed by hours or even days, such as item purchases, leading to a lag in model enhancement. In this paper, we propose a novel extension of online learning methods for re-ranking modeling, which we term LAST, an acronym for Learning At Serving Time. It circumvents the requirement of user feedback by using a surrogate model to provide the instructional signal needed to steer model improvement. Upon receiving an online request, LAST finds and applies a model modification on the fly before generating a recommendation result for the request. The modification is request-specific and transient. It means the modification is tailored to and only to the current request to capture the specific context of the request. After a request, the modification is discarded, which helps to prevent error propagation and stabilizes the online learning procedure since the predictions of the surrogate model may be inaccurate. Most importantly, as a complement to feedback-based online learning methods, LAST can be seamlessly integrated into existing online learning systems to create a more adaptive and responsive recommendation experience. Comprehensive experiments, both offline and online, affirm that LAST outperforms state-of-the-art re-ranking models.
Related papers
- Training Language Models to Self-Correct via Reinforcement Learning [98.35197671595343]
Self-correction has been found to be largely ineffective in modern large language models (LLMs)
We develop a multi-turn online reinforcement learning approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data.
We find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on MATH and HumanEval.
arXiv Detail & Related papers (2024-09-19T17:16:21Z) - Online Bandit Learning with Offline Preference Data [15.799929216215672]
We propose a posterior sampling algorithm for online learning that can be warm-started with an offline dataset with noisy preference feedback.
We show that by modeling the 'competence' of the expert that generated it, we are able to use such a dataset most effectively.
arXiv Detail & Related papers (2024-06-13T20:25:52Z) - Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment [65.15914284008973]
We propose to leverage an Inverse Reinforcement Learning (IRL) technique to simultaneously build an reward model and a policy model.
We show that the proposed algorithms converge to the stationary solutions of the IRL problem.
Our results indicate that it is beneficial to leverage reward learning throughout the entire alignment process.
arXiv Detail & Related papers (2024-05-28T07:11:05Z) - Direct Language Model Alignment from Online AI Feedback [78.40436231613754]
Direct alignment from preferences (DAP) methods have recently emerged as efficient alternatives to reinforcement learning from human feedback (RLHF)
In this study, we posit that online feedback is key and improves DAP methods.
Our method, online AI feedback (OAIF) uses an LLM as annotator: on each training, we sample two responses from the current model and prompt the LLM annotator to choose which one is preferred, thus providing online feedback.
arXiv Detail & Related papers (2024-02-07T12:31:13Z) - Netflix and Forget: Efficient and Exact Machine Unlearning from
Bi-linear Recommendations [15.789980605221672]
This paper focuses on simple but widely deployed bi-linear models for recommendations based on matrix completion.
We develop Unlearn-ALS by making a few key modifications to the fine-tuning procedure under Alternating Least Squares.
We show that Unlearn-ALS is consistent with retraining without emphany model degradation and exhibits rapid convergence.
arXiv Detail & Related papers (2023-02-13T20:27:45Z) - Effective and Efficient Training for Sequential Recommendation using
Recency Sampling [91.02268704681124]
We propose a novel Recency-based Sampling of Sequences training objective.
We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec.
arXiv Detail & Related papers (2022-07-06T13:06:31Z) - Re-parameterizing Your Optimizers rather than Architectures [119.08740698936633]
We propose a novel paradigm of incorporating model-specific prior knowledge into Structurals and using them to train generic (simple) models.
As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper- parameters.
For a simple model trained with a Repr, we focus on a VGG-style plain model and showcase that such a simple model trained with a Repr, which is referred to as Rep-VGG, performs on par with the recent well-designed models.
arXiv Detail & Related papers (2022-05-30T16:55:59Z) - New Insights on Reducing Abrupt Representation Change in Online
Continual Learning [69.05515249097208]
We focus on the change in representations of observed data that arises when previously unobserved classes appear in the incoming data stream.
We show that applying Experience Replay causes the newly added classes' representations to overlap significantly with the previous classes.
We propose a new method which mitigates this issue by shielding the learned representations from drastic adaptation to accommodate new classes.
arXiv Detail & Related papers (2022-03-08T01:37:00Z) - Incremental Learning for Personalized Recommender Systems [8.020546404087922]
We present an incremental learning solution to provide both the training efficiency and the model quality.
The solution is deployed in LinkedIn and directly applicable to industrial scale recommender systems.
arXiv Detail & Related papers (2021-08-13T04:21:21Z) - WeiPS: a symmetric fusion model framework for large-scale online
learning [6.88870384575896]
We propose a symmetric fusion online learning system framework called WeiPS, which integrates model training and model inference.
Specifically, WeiPS carries out second level model deployment by streaming update mechanism to satisfy the consistency requirement.
It uses multi-level fault tolerance and real-time domino degradation to achieve high availability requirement.
arXiv Detail & Related papers (2020-11-24T09:25:39Z) - ADER: Adaptively Distilled Exemplar Replay Towards Continual Learning
for Session-based Recommendation [28.22402119581332]
Session-based recommendation has received growing attention recently due to the increasing privacy concern.
We propose a method called Adaptively Distilled Exemplar Replay (ADER) by periodically replaying previous training samples.
ADER consistently outperforms other baselines, and it even outperforms the method using all historical data at every update cycle.
arXiv Detail & Related papers (2020-07-23T13:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.