Adversarial Learning for Incentive Optimization in Mobile Payment
Marketing
- URL: http://arxiv.org/abs/2112.15434v1
- Date: Tue, 28 Dec 2021 07:54:39 GMT
- Title: Adversarial Learning for Incentive Optimization in Mobile Payment
Marketing
- Authors: Xuanying Chen, Zhining Liu, Li Yu, Sen Li, Lihong Gu, Xiaodong Zeng,
Yize Tan and Jinjie Gu
- Abstract summary: Payment platforms hold large-scale marketing campaigns, which allocate incentives to encourage users to pay through their applications.
To maximize the return on investment, incentive allocations are commonly solved in a two-stage procedure.
We propose a bias correction adversarial network to overcome this obstacle.
- Score: 17.645000197183045
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many payment platforms hold large-scale marketing campaigns, which allocate
incentives to encourage users to pay through their applications. To maximize
the return on investment, incentive allocations are commonly solved in a
two-stage procedure. After training a response estimation model to estimate the
users' mobile payment probabilities (MPP), a linear programming process is
applied to obtain the optimal incentive allocation. However, the large amount
of biased data in the training set, generated by the previous biased allocation
policy, causes a biased estimation. This bias deteriorates the performance of
the response model and misleads the linear programming process, dramatically
degrading the performance of the resulting allocation policy. To overcome this
obstacle, we propose a bias correction adversarial network. Our method
leverages the small set of unbiased data obtained under a full-randomized
allocation policy to train an unbiased model and then uses it to reduce the
bias with adversarial learning. Offline and online experimental results
demonstrate that our method outperforms state-of-the-art approaches and
significantly improves the performance of the resulting allocation policy in a
real-world marketing campaign.
Related papers
- Reward-Augmented Data Enhances Direct Preference Alignment of LLMs [63.32585910975191]
We introduce reward-conditioned Large Language Models (LLMs) that learn from the entire spectrum of response quality within the dataset.
We propose an effective yet simple data relabeling method that conditions the preference pairs on quality scores to construct a reward-augmented dataset.
arXiv Detail & Related papers (2024-10-10T16:01:51Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - $i$REPO: $i$mplicit Reward Pairwise Difference based Empirical Preference Optimization [12.266207199002604]
Large Language Models (LLM) can sometimes produce outputs that deviate from human expectations.
We propose a novel framework named $i$REPO, which utilizes implicit Reward pairwise difference regression for Empirical Preference Optimization.
We show that $i$REPO effectively achieves self-alignment using soft-label, self-generated responses and the logit of empirical AI annotators.
arXiv Detail & Related papers (2024-05-24T05:42:11Z) - $Δ\text{-}{\rm OPE}$: Off-Policy Estimation with Pairs of Policies [13.528097424046823]
We introduce $Deltatext-rm OPE$ methods based on the widely used Inverse Propensity Scoring estimator.
Simulated, offline, and online experiments show that our methods significantly improve performance for both evaluation and learning tasks.
arXiv Detail & Related papers (2024-05-16T12:04:55Z) - Metalearners for Ranking Treatment Effects [1.469168639465869]
We show how learning to rank can maximize the area under a policy's incremental profit curve.
We show how learning to rank can maximize the area under a policy's incremental profit curve.
arXiv Detail & Related papers (2024-05-03T15:31:18Z) - OptiGrad: A Fair and more Efficient Price Elasticity Optimization via a Gradient Based Learning [7.145413681946911]
This paper presents a novel approach to optimizing profit margins in non-life insurance markets through a gradient descent-based method.
It targets three key objectives: 1) maximizing profit margins, 2) ensuring conversion rates, and 3) enforcing fairness criteria such as demographic parity (DP)
arXiv Detail & Related papers (2024-04-16T04:21:59Z) - Learning Fair Ranking Policies via Differentiable Optimization of
Ordered Weighted Averages [55.04219793298687]
This paper shows how efficiently-solvable fair ranking models can be integrated into the training loop of Learning to Rank.
In particular, this paper is the first to show how to backpropagate through constrained optimizations of OWA objectives, enabling their use in integrated prediction and decision models.
arXiv Detail & Related papers (2024-02-07T20:53:53Z) - Boosting Offline Reinforcement Learning with Action Preference Query [32.94932149345299]
Training practical agents usually involve offline and online reinforcement learning (RL) to balance the policy's performance and interaction costs.
Online fine-tuning has become a commonly used method to correct the erroneous estimates of out-of-distribution data learned in the offline training phase.
In this work, we introduce an interaction-free training scheme dubbed Offline-with-Action-Preferences (OAP)
arXiv Detail & Related papers (2023-06-06T02:29:40Z) - Towards Equal Opportunity Fairness through Adversarial Learning [64.45845091719002]
Adversarial training is a common approach for bias mitigation in natural language processing.
We propose an augmented discriminator for adversarial training, which takes the target class as input to create richer features.
arXiv Detail & Related papers (2022-03-12T02:22:58Z) - Off-policy Reinforcement Learning with Optimistic Exploration and
Distribution Correction [73.77593805292194]
We train a separate exploration policy to maximize an approximate upper confidence bound of the critics in an off-policy actor-critic framework.
To mitigate the off-policy-ness, we adapt the recently introduced DICE framework to learn a distribution correction ratio for off-policy actor-critic training.
arXiv Detail & Related papers (2021-10-22T22:07:51Z) - Mind the Trade-off: Debiasing NLU Models without Degrading the
In-distribution Performance [70.31427277842239]
We introduce a novel debiasing method called confidence regularization.
It discourages models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples.
We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets.
arXiv Detail & Related papers (2020-05-01T11:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.