Related papers: Free Lunch! Retrospective Uplift Modeling for Dynamic Promotions Recommendation within ROI Constraints

Free Lunch! Retrospective Uplift Modeling for Dynamic Promotions Recommendation within ROI Constraints

URL: http://arxiv.org/abs/2008.06293v2
Date: Mon, 17 Aug 2020 06:31:44 GMT
Title: Free Lunch! Retrospective Uplift Modeling for Dynamic Promotions Recommendation within ROI Constraints
Authors: Dmitri Goldenberg, Javier Albert, Lucas Bernardi and Pablo Estevez
Abstract summary: For online travel platforms (OTPs), popular promotions include room upgrades, free meals and transportation services. Promotions usually incur a cost that, if uncontrolled, can become unsustainable. For a promotion to be viable, its associated costs must be balanced by incremental revenue within set financial constraints. This paper introduces a novel uplift modeling technique, relying on the Knapsack Problem formulation.
Score: 9.733174472837275
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Promotions and discounts have become key components of modern e-commerce platforms. For online travel platforms (OTPs), popular promotions include room upgrades, free meals and transportation services. By offering these promotions, customers can get more value for their money, while both the OTP and its travel partners may grow their loyal customer base. However, the promotions usually incur a cost that, if uncontrolled, can become unsustainable. Consequently, for a promotion to be viable, its associated costs must be balanced by incremental revenue within set financial constraints. Personalized treatment assignment can be used to satisfy such constraints. This paper introduces a novel uplift modeling technique, relying on the Knapsack Problem formulation, that dynamically optimizes the incremental treatment outcome subject to the required Return on Investment (ROI) constraints. The technique leverages Retrospective Estimation, a modeling approach that relies solely on data from positive outcome examples. The method also addresses training data bias, long term effects, and seasonality challenges via online-dynamic calibration. This approach was tested via offline experiments and online randomized controlled trials at Booking .com - a leading OTP with millions of customers worldwide, resulting in a significant increase in the target outcome while staying within the required financial constraints and outperforming other approaches.

Related papers

Generative Large-Scale Pre-trained Models for Automated Ad Bidding Optimization [5.460538555236247]
We propose GRAD (Generative Reward-driven Ad-bidding with Mixture-of-Experts), a scalable foundation model for auto-bidding.<n>We show that GRAD significantly enhances platform revenue, highlighting its effectiveness in addressing the evolving and diverse requirements of modern advertisers.
arXiv Detail & Related papers (2025-08-04T02:46:18Z)
Your Offline Policy is Not Trustworthy: Bilevel Reinforcement Learning for Sequential Portfolio Optimization [82.03139922490796]
Reinforcement learning (RL) has shown significant promise for sequential portfolio optimization tasks, such as stock trading, where the objective is to maximize cumulative returns while minimizing risks using historical data.<n>Traditional RL approaches often produce policies that merely memorize the optimal yet impractical buying and selling behaviors within the fixed dataset.<n>Our approach frames portfolio optimization as a new type of partial-offline RL problem and makes two technical contributions.
arXiv Detail & Related papers (2025-05-19T06:37:25Z)
Self-Regulation and Requesting Interventions [63.5863047447313]
We propose an offline framework that trains a "helper" policy to request interventions. We score optimal intervention timing with PRMs and train the helper model on these labeled trajectories. This offline approach significantly reduces costly intervention calls during training.
arXiv Detail & Related papers (2025-02-07T00:06:17Z)
Large Language Model driven Policy Exploration for Recommender Systems [50.70228564385797]
offline RL policies trained on static user data are vulnerable to distribution shift when deployed in dynamic online environments. Online RL-based RS also face challenges in production deployment due to the risks of exposing users to untrained or unstable policies. Large Language Models (LLMs) offer a promising solution to mimic user objectives and preferences for pre-training policies offline. We propose an Interaction-Augmented Learned Policy (iALP) that utilizes user preferences distilled from an LLM.
arXiv Detail & Related papers (2025-01-23T16:37:44Z)
Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models [54.381650481255235]
We introduce a new tuning-free approach for self-alignment, Dynamic Rewarding with Prompt Optimization (O) Our approach leverages a search-based optimization framework that allows LLMs to iteratively self-improve and craft the optimal alignment instructions. Empirical evaluations on eight recent LLMs, both open and closed-sourced, demonstrate that DRPO significantly enhances alignment performance.
arXiv Detail & Related papers (2024-11-13T16:15:38Z)
MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services [94.61039892220037]
We propose an immersion-aware model trading framework that facilitates data provision for services while ensuring privacy through federated learning (FL) We design an incentive mechanism to incentivize metaverse users (MUs) to contribute high-value models under resource constraints. We develop a fully distributed dynamic reward algorithm based on deep reinforcement learning, without accessing any private information about MUs and other MSPs.
arXiv Detail & Related papers (2024-10-25T16:20:46Z)
End-to-End Cost-Effective Incentive Recommendation under Budget Constraint with Uplift Modeling [12.160403526724476]
We propose a novel End-to-End Cost-Effective Incentive Recommendation (E3IR) model under budget constraints. Specifically, our methods consist of two modules, i.e., the uplift prediction module and the differentiable allocation module. Our E3IR improves allocation performance compared to existing two-stage approaches.
arXiv Detail & Related papers (2024-08-21T13:48:00Z)
A Primal-Dual Online Learning Approach for Dynamic Pricing of Sequentially Displayed Complementary Items under Sale Constraints [54.46126953873298]
We address the problem of dynamically pricing complementary items that are sequentially displayed to customers. Coherent pricing policies for complementary items are essential because optimizing the pricing of each item individually is ineffective. We empirically evaluate our approach using synthetic settings randomly generated from real-world data, and compare its performance in terms of constraints violation and regret.
arXiv Detail & Related papers (2024-07-08T09:55:31Z)
A Bargaining-based Approach for Feature Trading in Vertical Federated Learning [54.51890573369637]
We propose a bargaining-based feature trading approach in Vertical Federated Learning (VFL) to encourage economically efficient transactions. Our model incorporates performance gain-based pricing, taking into account the revenue-based optimization objectives of both parties.
arXiv Detail & Related papers (2024-02-23T10:21:07Z)
Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in Self-Refined Open-Source Models [53.859446823312126]
SoTA open source models of varying sizes from 7B - 65B, on average, improve 8.2% from their baseline performance. Strikingly, even models with extremely small memory footprints, such as Vicuna-7B, show a 11.74% improvement overall and up to a 25.39% improvement in high-creativity, open ended tasks.
arXiv Detail & Related papers (2023-10-11T15:56:00Z)
A Meta-learning based Stacked Regression Approach for Customer Lifetime Value Prediction [3.6002910014361857]
Customer Lifetime Value (CLV) is the total monetary value of transactions/purchases made by a customer with the business over an intended period of time. CLV finds application in a number of distinct business domains such as Banking, Insurance, Online-entertainment, Gaming, and E-Commerce. We propose a system which is able to qualify both as effective, and comprehensive yet simple and interpretable.
arXiv Detail & Related papers (2023-08-07T14:22:02Z)
Incremental Profit per Conversion: a Response Transformation for Uplift Modeling in E-Commerce Promotions [1.7640556247739623]
This paper focuses on promotions with response-dependent costs, where expenses are incurred only when a purchase is made. Existing uplift model approaches often necessitate training multiple models, like meta-learners, or encounter complications when estimating profit. We introduce Incremental Profit per Conversion (IPC), a novel uplift measure of promotional campaigns' efficiency in unit economics.
arXiv Detail & Related papers (2023-06-23T19:46:02Z)
Structured Dynamic Pricing: Optimal Regret in a Global Shrinkage Model [50.06663781566795]
We consider a dynamic model with the consumers' preferences as well as price sensitivity varying over time. We measure the performance of a dynamic pricing policy via regret, which is the expected revenue loss compared to a clairvoyant that knows the sequence of model parameters in advance. Our regret analysis results not only demonstrate optimality of the proposed policy but also show that for policy planning it is essential to incorporate available structural information.
arXiv Detail & Related papers (2023-03-28T00:23:23Z)
SaFormer: A Conditional Sequence Modeling Approach to Offline Safe Reinforcement Learning [64.33956692265419]
offline safe RL is of great practical relevance for deploying agents in real-world applications. We present a novel offline safe RL approach referred to as SaFormer.
arXiv Detail & Related papers (2023-01-28T13:57:01Z)
BCRLSP: An Offline Reinforcement Learning Framework for Sequential Targeted Promotion [8.499811428928071]
We propose the Budget Constrained Reinforcement Learning for Sequential Promotion framework to determine the value of cash bonuses to be sent to users. We show that BCRLSP achieves a higher long-term customer retention rate and a lower cost than various baselines.
arXiv Detail & Related papers (2022-07-16T00:10:12Z)
E-Commerce Promotions Personalization via Online Multiple-Choice Knapsack with Uplift Modeling [1.027974860479791]
We study the Online Constrained Multiple-Choice Promotions Personalization Problem. Our work formalizes the problem as an Online Multiple Choice Knapsack Problem. We provide a real-time adaptive method that guarantees budget constraints compliance.
arXiv Detail & Related papers (2021-08-11T15:09:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.