Related papers: MultiSlot ReRanker: A Generic Model-based Re-Ranking Framework in Recommendation Systems

MultiSlot ReRanker: A Generic Model-based Re-Ranking Framework in Recommendation Systems

URL: http://arxiv.org/abs/2401.06293v1
Date: Thu, 11 Jan 2024 23:17:07 GMT
Title: MultiSlot ReRanker: A Generic Model-based Re-Ranking Framework in Recommendation Systems
Authors: Qiang Charles Xiao, Ajith Muralidharan, Birjodh Tiwana, Johnson Jia, Fedor Borisyuk, Aman Gupta, Dawn Woodard
Abstract summary: We propose a generic model-based re-ranking framework, MultiSlot ReRanker, which simultaneously optimize relevance, diversity, and freshness. We have built a multi-slot re-ranking simulator based on OpenAI Gym integrated with the Ray framework.
Score: 6.0232112783722
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we propose a generic model-based re-ranking framework, MultiSlot ReRanker, which simultaneously optimizes relevance, diversity, and freshness. Specifically, our Sequential Greedy Algorithm (SGA) is efficient enough (linear time complexity) for large-scale production recommendation engines. It achieved a lift of $+6\%$ to $ +10\%$ offline Area Under the receiver operating characteristic Curve (AUC) which is mainly due to explicitly modeling mutual influences among items of a list, and leveraging the second pass ranking scores of multiple objectives. In addition, we have generalized the offline replay theory to multi-slot re-ranking scenarios, with trade-offs among multiple objectives. The offline replay results can be further improved by Pareto Optimality. Moreover, we've built a multi-slot re-ranking simulator based on OpenAI Gym integrated with the Ray framework. It can be easily configured for different assumptions to quickly benchmark both reinforcement learning and supervised learning algorithms.

Related papers

RLAR: An Agentic Reward System for Multi-task Reinforcement Learning on Large Language Models [86.61108562387993]
RLAR (Reinforcement Learning from Agent Rewards) is an agent-driven framework that dynamically assigns tailored reward functions to individual queries.<n>We show that RLAR yields consistent performance gains ranging from 10 to 60 across mathematics, coding, translation, and dialogue tasks.
arXiv Detail & Related papers (2026-02-28T16:14:43Z)
Can Recommender Systems Teach Themselves? A Recursive Self-Improving Framework with Fidelity Control [82.30868101940068]
We propose a paradigm in which a model bootstraps its own performance without reliance on external data or teacher models.<n>Our theoretical analysis shows that RSIR acts as a data-driven implicit regularizer, smoothing the optimization landscape.<n>We show that even smaller models benefit, and weak models can generate effective training curricula for stronger ones.
arXiv Detail & Related papers (2026-02-17T15:31:32Z)
Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models [0.0]
We propose Dynamic Rank Reinforcement Learning (DR-RL), a novel framework that adaptively optimize the low-rank factorization of Multi-Head Self-Attention (MHSA) in Large Language Models (LLMs)<n>DR-RL maintains downstream accuracy statistically equivalent to full-rank attention while significantly reducing Floating Point Operations (FLOPs)<n>This work bridges the gap between adaptive efficiency and theoretical rigor in MHSA, offering a principled, mathematically grounded alternative to rank reduction techniques in resource-constrained deep learning.
arXiv Detail & Related papers (2025-12-17T21:09:19Z)
GReF: A Unified Generative Framework for Efficient Reranking via Ordered Multi-token Prediction [12.254397628788647]
Reranking plays a crucial role in modeling intra-list correlations among items.<n>Recent research follows a two-stage (generator-evaluator) paradigm.<n>We propose a Unified Generative Efficient Reranking Framework (GReF) to address the two primary challenges.
arXiv Detail & Related papers (2025-10-29T06:54:42Z)
A Generative Re-ranking Model for List-level Multi-objective Optimization at Taobao [20.11584617315975]
We propose a novel end-to-end generative re-ranking model named Sequential Ordered Regression Transformer-Generator (SORT-Gen) for the less-studied list-level multi-objective optimization problem.<n>SORT-Gen has been successfully deployed in multiple scenarios of Taobao App, serving for a vast number of users.
arXiv Detail & Related papers (2025-05-12T03:01:14Z)
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning [51.54046200512198]
Retrieval-augmented generation (RAG) is extensively utilized to incorporate external, current knowledge into large language models. A standard RAG pipeline may comprise several components, such as query rewriting, document retrieval, document filtering, and answer generation. To overcome these challenges, we propose treating the RAG pipeline as a multi-agent cooperative task, with each component regarded as an RL agent.
arXiv Detail & Related papers (2025-01-25T14:24:50Z)
RAMQA: A Unified Framework for Retrieval-Augmented Multi-Modal Question Answering [9.915889321513678]
RAMQA is a unified framework combining learning-to-rank methods with generative permutation-enhanced ranking techniques. Our generative ranking model generates re-ranked document IDs and specific answers from document candidates in various permutations.
arXiv Detail & Related papers (2025-01-23T00:50:33Z)
Enhancing Spectrum Efficiency in 6G Satellite Networks: A GAIL-Powered Policy Learning via Asynchronous Federated Inverse Reinforcement Learning [67.95280175998792]
A novel adversarial imitation learning (GAIL)-powered policy learning approach is proposed for optimizing beamforming, spectrum allocation, and remote user equipment (RUE) association ins. We employ inverse RL (IRL) to automatically learn reward functions without manual tuning. We show that the proposed MA-AL method outperforms traditional RL approaches, achieving a $14.6%$ improvement in convergence and reward value.
arXiv Detail & Related papers (2024-09-27T13:05:02Z)
Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation [27.243116376164906]
We introduce a lightweight framework called full-scale Matryoshka representation learning for multimodal recommendation (fMRLRec) Our fMRLRec captures item features at different granularities, learning informative representations for efficient recommendation across multiple dimensions. We demonstrate the effectiveness and efficiency of fMRLRec on multiple benchmark datasets.
arXiv Detail & Related papers (2024-09-25T05:12:07Z)
LLM-enhanced Reranking in Recommender Systems [49.969932092129305]
Reranking is a critical component in recommender systems, playing an essential role in refining the output of recommendation algorithms. We introduce a comprehensive reranking framework, designed to seamlessly integrate various reranking criteria. A customizable input mechanism is also integrated, enabling the tuning of the language model's focus to meet specific reranking needs.
arXiv Detail & Related papers (2024-06-18T09:29:18Z)
ALaRM: Align Language Models via Hierarchical Rewards Modeling [41.79125107279527]
We introduce ALaRM, the first framework modeling hierarchical rewards in reinforcement learning from human feedback. The framework addresses the limitations of current alignment approaches, by integrating holistic rewards with aspect-specific rewards. We validate our approach through applications in long-form question answering and machine translation tasks.
arXiv Detail & Related papers (2024-03-11T14:28:40Z)
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models [69.51130760097818]
We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function. We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks.
arXiv Detail & Related papers (2023-11-15T04:40:43Z)
TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series [57.4208255711412]
Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS) We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks.
arXiv Detail & Related papers (2023-10-02T16:45:19Z)
Dynamic Perceiver for Efficient Visual Recognition [87.08210214417309]
We propose Dynamic Perceiver (Dyn-Perceiver) to decouple the feature extraction procedure and the early classification task. A feature branch serves to extract image features, while a classification branch processes a latent code assigned for classification tasks. Early exits are placed exclusively within the classification branch, thus eliminating the need for linear separability in low-level features.
arXiv Detail & Related papers (2023-06-20T03:00:22Z)
Tightly Coupled Learning Strategy for Weakly Supervised Hierarchical Place Recognition [0.09558392439655011]
We propose a tightly coupled learning (TCL) strategy to train triplet models. It combines global and local descriptors for joint optimization. Our lightweight unified model is better than several state-of-the-art methods.
arXiv Detail & Related papers (2022-02-14T03:20:39Z)
Learning-To-Ensemble by Contextual Rank Aggregation in E-Commerce [8.067201256886733]
We propose a new Learning-To-Ensemble framework RAEGO, which replaces the ensemble model with a contextual Rank Aggregator. RA-EGO has been deployed in our online system and has improved the revenue significantly.
arXiv Detail & Related papers (2021-07-19T03:24:06Z)
DORB: Dynamically Optimizing Multiple Rewards with Bandits [101.68525259222164]
Policy-based reinforcement learning has proven to be a promising approach for optimizing non-differentiable evaluation metrics for language generation tasks. We use the Exp3 algorithm for bandits and formulate two approaches for bandit rewards: (1) Single Multi-reward Bandit (SM-Bandit); (2) Hierarchical Multi-reward Bandit (HM-Bandit) We empirically show the effectiveness of our approaches via various automatic metrics and human evaluation on two important NLG tasks.
arXiv Detail & Related papers (2020-11-15T21:57:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.