MultiSlot ReRanker: A Generic Model-based Re-Ranking Framework in
Recommendation Systems
- URL: http://arxiv.org/abs/2401.06293v1
- Date: Thu, 11 Jan 2024 23:17:07 GMT
- Title: MultiSlot ReRanker: A Generic Model-based Re-Ranking Framework in
Recommendation Systems
- Authors: Qiang Charles Xiao, Ajith Muralidharan, Birjodh Tiwana, Johnson Jia,
Fedor Borisyuk, Aman Gupta, Dawn Woodard
- Abstract summary: We propose a generic model-based re-ranking framework, MultiSlot ReRanker, which simultaneously optimize relevance, diversity, and freshness.
We have built a multi-slot re-ranking simulator based on OpenAI Gym integrated with the Ray framework.
- Score: 6.0232112783722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a generic model-based re-ranking framework,
MultiSlot ReRanker, which simultaneously optimizes relevance, diversity, and
freshness. Specifically, our Sequential Greedy Algorithm (SGA) is efficient
enough (linear time complexity) for large-scale production recommendation
engines. It achieved a lift of $+6\%$ to $ +10\%$ offline Area Under the
receiver operating characteristic Curve (AUC) which is mainly due to explicitly
modeling mutual influences among items of a list, and leveraging the second
pass ranking scores of multiple objectives. In addition, we have generalized
the offline replay theory to multi-slot re-ranking scenarios, with trade-offs
among multiple objectives. The offline replay results can be further improved
by Pareto Optimality. Moreover, we've built a multi-slot re-ranking simulator
based on OpenAI Gym integrated with the Ray framework. It can be easily
configured for different assumptions to quickly benchmark both reinforcement
learning and supervised learning algorithms.
Related papers
- Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins.
We employ inverse RL (IRL) to automatically learn reward functions without manual tuning.
We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z) - Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation [27.243116376164906]
We introduce a lightweight framework called full-scale Matryoshka representation learning for multimodal recommendation (fMRLRec)
Our fMRLRec captures item features at different granularities, learning informative representations for efficient recommendation across multiple dimensions.
We demonstrate the effectiveness and efficiency of fMRLRec on multiple benchmark datasets.
arXiv Detail & Related papers (2024-09-25T05:12:07Z) - LLM-enhanced Reranking in Recommender Systems [49.969932092129305]
Reranking is a critical component in recommender systems, playing an essential role in refining the output of recommendation algorithms.
We introduce a comprehensive reranking framework, designed to seamlessly integrate various reranking criteria.
A customizable input mechanism is also integrated, enabling the tuning of the language model's focus to meet specific reranking needs.
arXiv Detail & Related papers (2024-06-18T09:29:18Z) - ALaRM: Align Language Models via Hierarchical Rewards Modeling [41.79125107279527]
We introduce ALaRM, the first framework modeling hierarchical rewards in reinforcement learning from human feedback.
The framework addresses the limitations of current alignment approaches, by integrating holistic rewards with aspect-specific rewards.
We validate our approach through applications in long-form question answering and machine translation tasks.
arXiv Detail & Related papers (2024-03-11T14:28:40Z) - Routing to the Expert: Efficient Reward-guided Ensemble of Large
Language Models [69.51130760097818]
We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function.
We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks.
arXiv Detail & Related papers (2023-11-15T04:40:43Z) - TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series [57.4208255711412]
Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS)
We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks.
arXiv Detail & Related papers (2023-10-02T16:45:19Z) - Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task.
A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks.
Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z) - Tightly Coupled Learning Strategy for Weakly Supervised Hierarchical
Place Recognition [0.09558392439655011]
We propose a tightly coupled learning (TCL) strategy to train triplet models.
It combines global and local descriptors for joint optimization.
Our lightweight unified model is better than several state-of-the-art methods.
arXiv Detail & Related papers (2022-02-14T03:20:39Z) - Learning-To-Ensemble by Contextual Rank Aggregation in E-Commerce [8.067201256886733]
We propose a new Learning-To-Ensemble framework RAEGO, which replaces the ensemble model with a contextual Rank Aggregator.
RA-EGO has been deployed in our online system and has improved the revenue significantly.
arXiv Detail & Related papers (2021-07-19T03:24:06Z) - DORB: Dynamically Optimizing Multiple Rewards with Bandits [101.68525259222164]
Policy-based reinforcement learning has proven to be a promising approach for optimizing non-differentiable evaluation metrics for language generation tasks.
We use the Exp3 algorithm for bandits and formulate two approaches for bandit rewards: (1) Single Multi-reward Bandit (SM-Bandit); (2) Hierarchical Multi-reward Bandit (HM-Bandit)
We empirically show the effectiveness of our approaches via various automatic metrics and human evaluation on two important NLG tasks.
arXiv Detail & Related papers (2020-11-15T21:57:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.