Related papers: Ad-load Balancing via Off-policy Learning in a Content Marketplace

Ad-load Balancing via Off-policy Learning in a Content Marketplace

URL: http://arxiv.org/abs/2309.11518v2
Date: Tue, 19 Dec 2023 07:40:45 GMT
Title: Ad-load Balancing via Off-policy Learning in a Content Marketplace
Authors: Hitesh Sagtani, Madan Jhawar, Rishabh Mehrotra, Olivier Jeunen
Abstract summary: Ad-load balancing is a critical challenge in online advertising systems, particularly in the context of social media platforms. Traditional approaches to ad-load balancing rely on static allocation policies, which fail to adapt to changing user preferences and contextual factors. We present an approach that leverages off-policy learning and evaluation from logged bandit feedback.
Score: 9.783697404304025
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ad-load balancing is a critical challenge in online advertising systems, particularly in the context of social media platforms, where the goal is to maximize user engagement and revenue while maintaining a satisfactory user experience. This requires the optimization of conflicting objectives, such as user satisfaction and ads revenue. Traditional approaches to ad-load balancing rely on static allocation policies, which fail to adapt to changing user preferences and contextual factors. In this paper, we present an approach that leverages off-policy learning and evaluation from logged bandit feedback. We start by presenting a motivating analysis of the ad-load balancing problem, highlighting the conflicting objectives between user satisfaction and ads revenue. We emphasize the nuances that arise due to user heterogeneity and the dependence on the user's position within a session. Based on this analysis, we define the problem as determining the optimal ad-load for a particular feed fetch. To tackle this problem, we propose an off-policy learning framework that leverages unbiased estimators such as Inverse Propensity Scoring (IPS) and Doubly Robust (DR) to learn and estimate the policy values using offline collected stochastic data. We present insights from online A/B experiments deployed at scale across over 80 million users generating over 200 million sessions, where we find statistically significant improvements in both user satisfaction metrics and ads revenue for the platform.

Related papers

Large Language Model driven Policy Exploration for Recommender Systems [50.70228564385797]
offline RL policies trained on static user data are vulnerable to distribution shift when deployed in dynamic online environments. Online RL-based RS also face challenges in production deployment due to the risks of exposing users to untrained or unstable policies. Large Language Models (LLMs) offer a promising solution to mimic user objectives and preferences for pre-training policies offline. We propose an Interaction-Augmented Learned Policy (iALP) that utilizes user preferences distilled from an LLM.
arXiv Detail & Related papers (2025-01-23T16:37:44Z)
Unveiling User Satisfaction and Creator Productivity Trade-Offs in Recommendation Platforms [68.51708490104687]
We show that a purely relevance-driven policy with low exploration strength boosts short-term user satisfaction but undermines the long-term richness of the content pool. Our findings reveal a fundamental trade-off between immediate user satisfaction and overall content production on platforms.
arXiv Detail & Related papers (2024-10-31T07:19:22Z)
MetaTrading: An Immersion-Aware Model Trading Framework for Vehicular Metaverse Services [94.61039892220037]
We propose an immersion-aware model trading framework that facilitates data provision for services while ensuring privacy through federated learning (FL) We design an incentive mechanism to incentivize metaverse users (MUs) to contribute high-value models under resource constraints. We develop a fully distributed dynamic reward algorithm based on deep reinforcement learning, without accessing any private information about MUs and other MSPs.
arXiv Detail & Related papers (2024-10-25T16:20:46Z)
Algorithmic Content Selection and the Impact of User Disengagement [19.14804091327051]
Digital services face a fundamental trade-off in content selection. They must balance the immediate revenue gained from high-reward content against the long-term benefits of maintaining user engagement.
arXiv Detail & Related papers (2024-10-17T00:43:06Z)
MisinfoEval: Generative AI in the Era of "Alternative Facts" [50.069577397751175]
We introduce a framework for generating and evaluating large language model (LLM) based misinformation interventions. We present (1) an experiment with a simulated social media environment to measure effectiveness of misinformation interventions, and (2) a second experiment with personalized explanations tailored to the demographics and beliefs of users. Our findings confirm that LLM-based interventions are highly effective at correcting user behavior.
arXiv Detail & Related papers (2024-10-13T18:16:50Z)
Modeling User Retention through Generative Flow Networks [34.74982897470852]
Flow-based modeling technique can back-propagate the retention reward towards each recommended item in the user session. We show that the flow combined with traditional learning-to-rank objectives eventually optimized a non-discounted cumulative reward for both immediate user feedback and user retention.
arXiv Detail & Related papers (2024-06-10T06:22:18Z)
User Welfare Optimization in Recommender Systems with Competing Content Creators [65.25721571688369]
In this study, we perform system-side user welfare optimization under a competitive game setting among content creators. We propose an algorithmic solution for the platform, which dynamically computes a sequence of weights for each user based on their satisfaction of the recommended content. These weights are then utilized to design mechanisms that adjust the recommendation policy or the post-recommendation rewards, thereby influencing creators' content production strategies.
arXiv Detail & Related papers (2024-04-28T21:09:52Z)
Collaborative-Enhanced Prediction of Spending on Newly Downloaded Mobile Games under Consumption Uncertainty [49.431361908465036]
We propose a robust model training and evaluation framework to mitigate label variance and extremes. Within this framework, we introduce a collaborative-enhanced model designed to predict user game spending without relying on user IDs. Our approach demonstrates notable improvements over production models, achieving a remarkable textbf17.11% enhancement on offline data.
arXiv Detail & Related papers (2024-04-12T07:47:02Z)
Maximizing the Success Probability of Policy Allocations in Online Systems [5.485872703839928]
In this paper we consider the problem at the level of user timelines instead of individual bid requests. In order to optimally allocate policies to users, typical multiple treatments allocation methods solve knapsack-like problems. We introduce the SuccessProMax algorithm that aims at finding the policy allocation which is the most likely to outperform a fixed policy.
arXiv Detail & Related papers (2023-12-26T10:55:33Z)
Online Ad Procurement in Non-stationary Autobidding Worlds [10.871587311621974]
We introduce a primal-dual algorithm for online decision making with multi-dimension decision variables, bandit feedback and long-term uncertain constraints. We show that our algorithm achieves low regret in many worlds when procurement outcomes are generated through procedures that are adversarial, adversarially corrupted, periodic, and ergodic.
arXiv Detail & Related papers (2023-07-10T00:41:08Z)
Targeted Advertising on Social Networks Using Online Variational Tensor Regression [19.586412285513962]
We propose what we believe is the first contextual bandit framework for online targeted advertising. The proposed framework is designed to accommodate any number of feature vectors in the form of multi-mode tensor. We empirically confirm that the proposedUCB algorithm achieves a significant improvement in influence tasks over the benchmarks.
arXiv Detail & Related papers (2022-08-22T22:10:45Z)
Adversarial Learning for Incentive Optimization in Mobile Payment Marketing [17.645000197183045]
Payment platforms hold large-scale marketing campaigns, which allocate incentives to encourage users to pay through their applications. To maximize the return on investment, incentive allocations are commonly solved in a two-stage procedure. We propose a bias correction adversarial network to overcome this obstacle.
arXiv Detail & Related papers (2021-12-28T07:54:39Z)
Personalized multi-faceted trust modeling to determine trust links in social media and its potential for misinformation management [61.88858330222619]
We present an approach for predicting trust links between peers in social media. We propose a data-driven multi-faceted trust modeling which incorporates many distinct features for a comprehensive analysis. Illustrated in a trust-aware item recommendation task, we evaluate the proposed framework in the context of a large Yelp dataset.
arXiv Detail & Related papers (2021-11-11T19:40:51Z)
Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising [52.3825928886714]
We formulate the sequential advertising strategy optimization as a dynamic knapsack problem. We propose a theoretically guaranteed bilevel optimization framework, which significantly reduces the solution space of the original optimization space. To improve the exploration efficiency of reinforcement learning, we also devise an effective action space reduction approach.
arXiv Detail & Related papers (2020-06-29T18:50:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.