Generator and Critic: A Deep Reinforcement Learning Approach for Slate
Re-ranking in E-commerce
- URL: http://arxiv.org/abs/2005.12206v1
- Date: Mon, 25 May 2020 16:24:01 GMT
- Title: Generator and Critic: A Deep Reinforcement Learning Approach for Slate
Re-ranking in E-commerce
- Authors: Jianxiong Wei, Anxiang Zeng, Yueqiu Wu, Peng Guo, Qingsong Hua,
Qingpeng Cai
- Abstract summary: We present a novel Generator and Critic slate re-ranking approach, where the Critic evaluates the slate and the Generator ranks the items by the reinforcement learning approach.
For the Generator, to tackle the problem of large action space, we propose a new exploration reinforcement learning algorithm, called PPO-Exploration.
- Score: 17.712394984304336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The slate re-ranking problem considers the mutual influences between items to
improve user satisfaction in e-commerce, compared with the point-wise ranking.
Previous works either directly rank items by an end to end model, or rank items
by a score function that trades-off the point-wise score and the diversity
between items. However, there are two main existing challenges that are not
well studied: (1) the evaluation of the slate is hard due to the complex mutual
influences between items of one slate; (2) even given the optimal evaluation,
searching the optimal slate is challenging as the action space is exponentially
large. In this paper, we present a novel Generator and Critic slate re-ranking
approach, where the Critic evaluates the slate and the Generator ranks the
items by the reinforcement learning approach. We propose a Full Slate Critic
(FSC) model that considers the real impressed items and avoids the impressed
bias of existing models. For the Generator, to tackle the problem of large
action space, we propose a new exploration reinforcement learning algorithm,
called PPO-Exploration. Experimental results show that the FSC model
significantly outperforms the state of the art slate evaluation methods, and
the PPO-Exploration algorithm outperforms the existing reinforcement learning
methods substantially. The Generator and Critic approach improves both the
slate efficiency(4% gmv and 5% number of orders) and diversity in live
experiments on one of the largest e-commerce websites in the world.
Related papers
- CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution [74.41064280094064]
textbfJudger-1 is the first open-source textbfall-in-one judge LLM.
CompassJudger-1 is a general-purpose LLM that demonstrates remarkable versatility.
textbfJudgerBench is a new benchmark that encompasses various subjective evaluation tasks.
arXiv Detail & Related papers (2024-10-21T17:56:51Z) - Diverging Preferences: When do Annotators Disagree and do Models Know? [92.24651142187989]
We develop a taxonomy of disagreement sources spanning 10 categories across four high-level classes.
We find that the majority of disagreements are in opposition with standard reward modeling approaches.
We develop methods for identifying diverging preferences to mitigate their influence on evaluation and training.
arXiv Detail & Related papers (2024-10-18T17:32:22Z) - Language Model Preference Evaluation with Multiple Weak Evaluators [78.53743237977677]
GED (Preference Graph Ensemble and Denoise) is a novel approach that leverages multiple model-based evaluators to construct preference graphs.
We show that GED outperforms baseline methods in model ranking, response selection, and model alignment tasks.
arXiv Detail & Related papers (2024-10-14T01:57:25Z) - Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling [51.38330727868982]
Bidirectional Decoding (BID) is a test-time inference algorithm that bridges action chunking with closed-loop operations.
We show that BID boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks.
arXiv Detail & Related papers (2024-08-30T15:39:34Z) - F-Eval: Assessing Fundamental Abilities with Refined Evaluation Methods [102.98899881389211]
We propose F-Eval, a bilingual evaluation benchmark to evaluate the fundamental abilities, including expression, commonsense and logic.
For reference-free subjective tasks, we devise new evaluation methods, serving as alternatives to scoring by API models.
arXiv Detail & Related papers (2024-01-26T13:55:32Z) - Investigating the Robustness of Sequential Recommender Systems Against
Training Data Perturbations [9.463133630647569]
We introduce Finite Rank-Biased Overlap (FRBO), an enhanced similarity tailored explicitly for finite rankings.
We empirically investigate the impact of removing items at different positions within a temporally ordered sequence.
Our results demonstrate that removing items at the end of the sequence has a statistically significant impact on performance.
arXiv Detail & Related papers (2023-07-24T23:26:46Z) - PIER: Permutation-Level Interest-Based End-to-End Re-ranking Framework
in E-commerce [13.885695433738437]
Existing re-ranking methods directly take the initial ranking list as input, and generate the optimal permutation through a well-designed context-wise model.
evaluating all candidate permutations brings unacceptable computational costs in practice.
This paper presents a novel end-to-end re-ranking framework named PIER to tackle the above challenges.
arXiv Detail & Related papers (2023-02-06T09:17:52Z) - Multi-Objective Personalized Product Retrieval in Taobao Search [27.994166796745496]
We propose a novel Multi-Objective Personalized Product Retrieval (MOPPR) model with four hierarchical optimization objectives: relevance, exposure, click and purchase.
MOPPR achieves 0.96% transaction and 1.29% GMV improvements in a 28-day online A/B test.
Since the Double-11 shopping festival of 2021, MOPPR has been fully deployed in mobile Taobao search, replacing the previous MGDSPR.
arXiv Detail & Related papers (2022-10-09T05:18:42Z) - Few-shot Action Recognition with Prototype-centered Attentive Learning [88.10852114988829]
Prototype-centered Attentive Learning (PAL) model composed of two novel components.
First, a prototype-centered contrastive learning loss is introduced to complement the conventional query-centered learning objective.
Second, PAL integrates a attentive hybrid learning mechanism that can minimize the negative impacts of outliers.
arXiv Detail & Related papers (2021-01-20T11:48:12Z) - Learning Robust Models for e-Commerce Product Search [23.537201383165755]
Showing items that do not match search query intent degrades customer experience in e-commerce.
Mitigating the problem requires a large labeled dataset.
We develop a deep, end-to-end model that learns to effectively classify mismatches.
arXiv Detail & Related papers (2020-05-07T17:22:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.