Related papers: Generator and Critic: A Deep Reinforcement Learning Approach for Slate Re-ranking in E-commerce

Generator and Critic: A Deep Reinforcement Learning Approach for Slate Re-ranking in E-commerce

URL: http://arxiv.org/abs/2005.12206v1
Date: Mon, 25 May 2020 16:24:01 GMT
Title: Generator and Critic: A Deep Reinforcement Learning Approach for Slate Re-ranking in E-commerce
Authors: Jianxiong Wei, Anxiang Zeng, Yueqiu Wu, Peng Guo, Qingsong Hua, Qingpeng Cai
Abstract summary: We present a novel Generator and Critic slate re-ranking approach, where the Critic evaluates the slate and the Generator ranks the items by the reinforcement learning approach. For the Generator, to tackle the problem of large action space, we propose a new exploration reinforcement learning algorithm, called PPO-Exploration.
Score: 17.712394984304336
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The slate re-ranking problem considers the mutual influences between items to improve user satisfaction in e-commerce, compared with the point-wise ranking. Previous works either directly rank items by an end to end model, or rank items by a score function that trades-off the point-wise score and the diversity between items. However, there are two main existing challenges that are not well studied: (1) the evaluation of the slate is hard due to the complex mutual influences between items of one slate; (2) even given the optimal evaluation, searching the optimal slate is challenging as the action space is exponentially large. In this paper, we present a novel Generator and Critic slate re-ranking approach, where the Critic evaluates the slate and the Generator ranks the items by the reinforcement learning approach. We propose a Full Slate Critic (FSC) model that considers the real impressed items and avoids the impressed bias of existing models. For the Generator, to tackle the problem of large action space, we propose a new exploration reinforcement learning algorithm, called PPO-Exploration. Experimental results show that the FSC model significantly outperforms the state of the art slate evaluation methods, and the PPO-Exploration algorithm outperforms the existing reinforcement learning methods substantially. The Generator and Critic approach improves both the slate efficiency(4% gmv and 5% number of orders) and diversity in live experiments on one of the largest e-commerce websites in the world.

Related papers

Variational Bayesian Personalized Ranking [39.24591060825056]
Variational BPR is a novel and easily implementable learning objective that integrates likelihood optimization, noise reduction, and popularity debiasing. We introduce an attention-based latent interest prototype contrastive mechanism, replacing instance-level contrastive learning, to effectively reduce noise from problematic samples. Empirically, we demonstrate the effectiveness of Variational BPR on popular backbone recommendation models.
arXiv Detail & Related papers (2025-03-14T04:22:01Z)
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution [74.41064280094064]
textbfJudger-1 is the first open-source textbfall-in-one judge LLM. CompassJudger-1 is a general-purpose LLM that demonstrates remarkable versatility. textbfJudgerBench is a new benchmark that encompasses various subjective evaluation tasks.
arXiv Detail & Related papers (2024-10-21T17:56:51Z)
Diverging Preferences: When do Annotators Disagree and do Models Know? [92.24651142187989]
We develop a taxonomy of disagreement sources spanning 10 categories across four high-level classes. We find that the majority of disagreements are in opposition with standard reward modeling approaches. We develop methods for identifying diverging preferences to mitigate their influence on evaluation and training.
arXiv Detail & Related papers (2024-10-18T17:32:22Z)
Language Model Preference Evaluation with Multiple Weak Evaluators [78.53743237977677]
GED (Preference Graph Ensemble and Denoise) is a novel approach that leverages multiple model-based evaluators to construct preference graphs. We show that GED outperforms baseline methods in model ranking, response selection, and model alignment tasks.
arXiv Detail & Related papers (2024-10-14T01:57:25Z)
Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations. We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z)
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models. It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
F-Eval: Assessing Fundamental Abilities with Refined Evaluation Methods [102.98899881389211]
We propose F-Eval, a bilingual evaluation benchmark to evaluate the fundamental abilities, including expression, commonsense and logic. For reference-free subjective tasks, we devise new evaluation methods, serving as alternatives to scoring by API models.
arXiv Detail & Related papers (2024-01-26T13:55:32Z)
Investigating the Robustness of Sequential Recommender Systems Against Training Data Perturbations [9.463133630647569]
We introduce Finite Rank-Biased Overlap (FRBO), an enhanced similarity tailored explicitly for finite rankings. We empirically investigate the impact of removing items at different positions within a temporally ordered sequence. Our results demonstrate that removing items at the end of the sequence has a statistically significant impact on performance.
arXiv Detail & Related papers (2023-07-24T23:26:46Z)
PIER: Permutation-Level Interest-Based End-to-End Re-ranking Framework in E-commerce [13.885695433738437]
Existing re-ranking methods directly take the initial ranking list as input, and generate the optimal permutation through a well-designed context-wise model. evaluating all candidate permutations brings unacceptable computational costs in practice. This paper presents a novel end-to-end re-ranking framework named PIER to tackle the above challenges.
arXiv Detail & Related papers (2023-02-06T09:17:52Z)
Multi-Objective Personalized Product Retrieval in Taobao Search [27.994166796745496]
We propose a novel Multi-Objective Personalized Product Retrieval (MOPPR) model with four hierarchical optimization objectives: relevance, exposure, click and purchase. MOPPR achieves 0.96% transaction and 1.29% GMV improvements in a 28-day online A/B test. Since the Double-11 shopping festival of 2021, MOPPR has been fully deployed in mobile Taobao search, replacing the previous MGDSPR.
arXiv Detail & Related papers (2022-10-09T05:18:42Z)
Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components. First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective. Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z)
Learning Robust Models for e-Commerce Product Search [23.537201383165755]
Showing items that do not match search query intent degrades customer experience in e-commerce. Mitigating the problem requires a large labeled dataset. We develop a deep, end-to-end model that learns to effectively classify mismatches.
arXiv Detail & Related papers (2020-05-07T17:22:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.