ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via
Parameter Constraint
- URL: http://arxiv.org/abs/2307.09193v2
- Date: Sat, 29 Jul 2023 14:01:18 GMT
- Title: ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via
Parameter Constraint
- Authors: Zhenhao Jiang, Biao Zeng, Hao Feng, Jin Liu, Jicong Fan, Jie Zhang,
Jia Jia, Ning Hu, Xingyu Chen, Xuguang Lan
- Abstract summary: We propose a novel Entire Space Multi-Task Model for Post-Click Conversion Rate via Constraint Experiments.
We handle "exposure_click_in-shop action" and "in-shop action_purchase" separately in the light of characteristics of in-shop action.
- Score: 38.561040267729105
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale online recommender system spreads all over the Internet being in
charge of two basic tasks: Click-Through Rate (CTR) and Post-Click Conversion
Rate (CVR) estimations. However, traditional CVR estimators suffer from
well-known Sample Selection Bias and Data Sparsity issues. Entire space models
were proposed to address the two issues via tracing the decision-making path of
"exposure_click_purchase". Further, some researchers observed that there are
purchase-related behaviors between click and purchase, which can better draw
the user's decision-making intention and improve the recommendation
performance. Thus, the decision-making path has been extended to
"exposure_click_in-shop action_purchase" and can be modeled with conditional
probability approach. Nevertheless, we observe that the chain rule of
conditional probability does not always hold. We report Probability Space
Confusion (PSC) issue and give a derivation of difference between ground-truth
and estimation mathematically. We propose a novel Entire Space Multi-Task Model
for Post-Click Conversion Rate via Parameter Constraint (ESMC) and two
alternatives: Entire Space Multi-Task Model with Siamese Network (ESMS) and
Entire Space Multi-Task Model in Global Domain (ESMG) to address the PSC issue.
Specifically, we handle "exposure_click_in-shop action" and "in-shop
action_purchase" separately in the light of characteristics of in-shop action.
The first path is still treated with conditional probability while the second
one is treated with parameter constraint strategy. Experiments on both offline
and online environments in a large-scale recommendation system illustrate the
superiority of our proposed methods over state-of-the-art models. The
real-world datasets will be released.
Related papers
- Near-Optimal Dynamic Regret for Adversarial Linear Mixture MDPs [63.47351876442425]
We study episodic linear mixture MDPs with the unknown transition and adversarial rewards under full-information feedback.
We propose a novel algorithm that combines the benefits of two popular methods: occupancy-measure-based and policy-based.
Our algorithm enjoys an $widetildemathcalO(d sqrtH3 K + sqrtHK(H + barP_K$)$ dynamic regret, where $d$ is the feature dimension.
arXiv Detail & Related papers (2024-11-05T13:55:52Z) - MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP)
MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved.
We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z) - Low-Rank Online Dynamic Assortment with Dual Contextual Information [12.373566593905792]
We introduce a new low-rank dynamic assortment model to transform this problem into a manageable scale.
We then propose an efficient algorithm that estimates the intrinsic subspaces and utilizes the upper confidence bound approach to address the exploration-exploitation trade-off in online decision making.
arXiv Detail & Related papers (2024-04-19T23:10:12Z) - Scaling Pareto-Efficient Decision Making Via Offline Multi-Objective RL [22.468486569700236]
The goal of multi-objective reinforcement learning (MORL) is to learn policies that simultaneously optimize multiple competing objectives.
We propose a new data-driven setup for offline MORL, where we wish to learn a preference-agnostic policy agent.
PEDA is a family of offline MORL algorithms that builds and extends Decision Transformers via a novel preference-and-return-conditioned policy.
arXiv Detail & Related papers (2023-04-30T20:15:26Z) - Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model [50.06663781566795]
We consider a dynamic model with the consumers' preferences as well as price sensitivity varying over time.
We measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance.
Our regret analysis results not only demonstrate optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information.
arXiv Detail & Related papers (2023-03-28T00:23:23Z) - Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences Constraints [13.069703665055084]
We propose a new recommendation algorithm for addressing the problem of two-sided online matching markets with complementary preferences and quota constraints.
The presence of mixed quota and complementary preferences constraints can lead to instability in the matching process.
We formulate the problem as a bandit learning framework and propose the Multi-agent Multi-type Thompson Sampling algorithm.
arXiv Detail & Related papers (2023-01-24T18:54:29Z) - ESCM$^2$: Entire Space Counterfactual Multi-Task Model for Post-Click
Conversion Rate Estimation [14.346868328637115]
Methods in Entire Space Multi-task Model (ESMM) family leverage sequential pattern of user actions to address data sparsity issue.
ESMM suffers from Inherent Estimation Bias (IEB) and Potential Independence Priority (PIP) issues.
We devise a principled approach named Entire Space Counterfactual Multi-task Modelling (ESCM$2$), which employs a counterfactual risk miminizer as a regularizer.
arXiv Detail & Related papers (2022-04-03T08:12:27Z) - Markov Decision Process modeled with Bandits for Sequential Decision
Making in Linear-flow [73.1896399783641]
In membership/subscriber acquisition and retention, we sometimes need to recommend marketing content for multiple pages in sequence.
We propose to formulate the problem as an MDP with Bandits where Bandits are employed to model the transition probability matrix.
We observe the proposed MDP with Bandits algorithm outperforms Q-learning with $epsilon$-greedy and decreasing $epsilon$, independent Bandits, and interaction Bandits.
arXiv Detail & Related papers (2021-07-01T03:54:36Z) - Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z) - Interpretable Deep Learning Model for Online Multi-touch Attribution [14.62385029537631]
We propose a novel model called DeepMTA, which combines deep learning model and additive feature explanation model for interpretable online multi-touch attribution.
As the first interpretable deep learning model for MTA, DeepMTA considers three important features in the customer journey.
Evaluation on a real dataset shows the proposed conversion prediction model achieves 91% accuracy.
arXiv Detail & Related papers (2020-03-26T23:21:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.