Entire Space Counterfactual Learning: Tuning, Analytical Properties and
Industrial Applications
- URL: http://arxiv.org/abs/2210.11039v1
- Date: Thu, 20 Oct 2022 06:19:50 GMT
- Title: Entire Space Counterfactual Learning: Tuning, Analytical Properties and
Industrial Applications
- Authors: Hao Wang, Zhichao Chen, Jiajun Fan, Yuxin Huang, Weiming Liu, Xinggao
Liu
- Abstract summary: Post-click conversion rate (CVR) estimation has long been plagued by sample selection bias and data sparsity issues.
This paper proposes a principled method named entire space counterfactual multi-task model (ESCM$2$), which employs a counterfactual risk minimizer to handle both IEB and PIP issues at once.
- Score: 5.9460659646670875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As a basic research problem for building effective recommender systems,
post-click conversion rate (CVR) estimation has long been plagued by sample
selection bias and data sparsity issues. To address the data sparsity issue,
prevalent methods based on entire space multi-task model leverage the
sequential pattern of user actions, i.e. exposure $\rightarrow$ click
$\rightarrow$ conversion to construct auxiliary learning tasks. However, they
still fall short of guaranteeing the unbiasedness of CVR estimates. This paper
theoretically demonstrates two defects of these entire space multi-task models:
(1) inherent estimation bias (IEB) for CVR estimation, where the CVR estimate
is inherently higher than the ground truth; (2) potential independence priority
(PIP) for CTCVR estimation, where the causality from click to conversion might
be overlooked. This paper further proposes a principled method named entire
space counterfactual multi-task model (ESCM$^2$), which employs a
counterfactual risk minimizer to handle both IEB and PIP issues at once. To
demonstrate the effectiveness of the proposed method, this paper explores its
parameter tuning in practice, derives its analytic properties, and showcases
its effectiveness in industrial CVR estimation, where ESCM$^2$ can effectively
alleviate the intrinsic IEB and PIP issues and outperform baseline models.
Related papers
- Online non-parametric likelihood-ratio estimation by Pearson-divergence
functional minimization [55.98760097296213]
We introduce a new framework for online non-parametric LRE (OLRE) for the setting where pairs of iid observations $(x_t sim p, x'_t sim q)$ are observed over time.
We provide theoretical guarantees for the performance of the OLRE method along with empirical validation in synthetic experiments.
arXiv Detail & Related papers (2023-11-03T13:20:11Z) - On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics [0.7655800373514546]
Risk-aware Reinforcement Learning algorithms were shown to outperform risk-neutral counterparts in a variety of continuous-action tasks.
The theoretical basis for the pessimistic objectives these algorithms employ remains unestablished.
We propose Dual Actor-Critic (DAC) as a risk-aware, model-free algorithm that features two distinct actor networks.
arXiv Detail & Related papers (2023-10-30T13:28:06Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - Uncertainty-Aware Instance Reweighting for Off-Policy Learning [63.31923483172859]
We propose a Uncertainty-aware Inverse Propensity Score estimator (UIPS) for improved off-policy learning.
Experiment results on synthetic and three real-world recommendation datasets demonstrate the advantageous sample efficiency of the proposed UIPS estimator.
arXiv Detail & Related papers (2023-03-11T11:42:26Z) - A Generalized Doubly Robust Learning Framework for Debiasing Post-Click
Conversion Rate Prediction [23.340584290411208]
Post-click conversion rate (CVR) prediction is an essential task for discovering user interests and increasing platform revenues.
Currently, doubly robust (DR) learning approaches achieve the state-of-the-art performance for debiasing CVR prediction.
We propose two new DR methods, namely DR-BIAS and DR-MSE, which control the bias of DR loss and balance the bias and variance flexibly.
arXiv Detail & Related papers (2022-11-12T15:09:23Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - ESCM$^2$: Entire Space Counterfactual Multi-Task Model for Post-Click
Conversion Rate Estimation [14.346868328637115]
Methods in Entire Space Multi-task Model (ESMM) family leverage sequential pattern of user actions to address data sparsity issue.
ESMM suffers from Inherent Estimation Bias (IEB) and Potential Independence Priority (PIP) issues.
We devise a principled approach named Entire Space Counterfactual Multi-task Modelling (ESCM$2$), which employs a counterfactual risk miminizer as a regularizer.
arXiv Detail & Related papers (2022-04-03T08:12:27Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Enhanced Doubly Robust Learning for Debiasing Post-click Conversion Rate
Estimation [29.27760413892272]
Post-click conversion, as a strong signal indicating the user preference, is salutary for building recommender systems.
Currently, most existing methods utilize counterfactual learning to debias recommender systems.
We propose a novel double learning approach for the MRDR estimator, which can convert the error imputation into the general CVR estimation.
arXiv Detail & Related papers (2021-05-28T06:59:49Z) - Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables.
Partially observable Markov decision processes (POMDPs) provide a natural model for such problems.
As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z) - LT4REC:A Lottery Ticket Hypothesis Based Multi-task Practice for Video
Recommendation System [2.7174057828883504]
Click-through rate prediction (CTR) and post-click conversion rate prediction (CVR) play key roles across all industrial ranking systems.
In this paper, we model CVR in a brand-new method by adopting the lottery-ticket-hypothesis-based sparse sharing multi-task learning.
Experiments on the dataset gathered from traffic logs of Tencent video's recommendation system demonstrate that sparse sharing in the CVR model significantly outperforms competitive methods.
arXiv Detail & Related papers (2020-08-22T16:48:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.