Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning
- URL: http://arxiv.org/abs/2501.05591v1
- Date: Thu, 09 Jan 2025 21:53:03 GMT
- Title: Session-Level Dynamic Ad Load Optimization using Offline Robust Reinforcement Learning
- Authors: Tao Liu, Qi Xu, Wei Shi, Zhigang Hua, Shuang Yang,
- Abstract summary: Session-level dynamic ad load optimization aims to personalize the density and types of delivered advertisements in real time during a user's online session.
Traditional causal learning-based approaches struggle with key technical challenges.
We develop an offline deep Q-network (DQN)-based framework that effectively mitigates confounding bias in dynamic systems.
- Score: 14.410333601657172
- License:
- Abstract: Session-level dynamic ad load optimization aims to personalize the density and types of delivered advertisements in real time during a user's online session by dynamically balancing user experience quality and ad monetization. Traditional causal learning-based approaches struggle with key technical challenges, especially in handling confounding bias and distribution shifts. In this paper, we develop an offline deep Q-network (DQN)-based framework that effectively mitigates confounding bias in dynamic systems and demonstrates more than 80% offline gains compared to the best causal learning-based production baseline. Moreover, to improve the framework's robustness against unanticipated distribution shifts, we further enhance our framework with a novel offline robust dueling DQN approach. This approach achieves more stable rewards on multiple OpenAI-Gym datasets as perturbations increase, and provides an additional 5% offline gains on real-world ad delivery data. Deployed across multiple production systems, our approach has achieved outsized topline gains. Post-launch online A/B tests have shown double-digit improvements in the engagement-ad score trade-off efficiency, significantly enhancing our platform's capability to serve both consumers and advertisers.
Related papers
- Dynamic Noise Preference Optimization for LLM Self-Improvement via Synthetic Data [51.62162460809116]
We introduce Dynamic Noise Preference Optimization (DNPO) to ensure consistent improvements across iterations.
In experiments with Zephyr-7B, DNPO consistently outperforms existing methods, showing an average performance boost of 2.6%.
DNPO shows a significant improvement in model-generated data quality, with a 29.4% win-loss rate gap compared to the baseline in GPT-4 evaluations.
arXiv Detail & Related papers (2025-02-08T01:20:09Z) - Incentive-Compatible Federated Learning with Stackelberg Game Modeling [11.863770989724959]
We introduce FLamma, a novel Federated Learning framework based on adaptive gamma-based Stackelberg game.
Our approach allows the server to act as the leader, dynamically adjusting a decay factor while clients, acting as followers, optimally select their number of local epochs to maximize their utility.
Over time, the server incrementally balances client influence, initially rewarding higher-contributing clients and gradually leveling their impact, driving the system toward a Stackelberg Equilibrium.
arXiv Detail & Related papers (2025-01-05T21:04:41Z) - Ads Supply Personalization via Doubly Robust Learning [13.392289135329833]
Ads supply personalization aims to balance the revenue and user engagement, two long-term objectives in social media ads, by tailoring the ad quantity and density.
In this paper, we present a streamlined framework for personalized ad supply.
It optimally utilizes information from data collection policies through the doubly robust learning. Consequently, it significantly improves the accuracy of long-term treatment effect estimates.
arXiv Detail & Related papers (2024-09-29T06:09:52Z) - Digital Twin-Assisted Data-Driven Optimization for Reliable Edge Caching in Wireless Networks [60.54852710216738]
We introduce a novel digital twin-assisted optimization framework, called D-REC, to ensure reliable caching in nextG wireless networks.
By incorporating reliability modules into a constrained decision process, D-REC can adaptively adjust actions, rewards, and states to comply with advantageous constraints.
arXiv Detail & Related papers (2024-06-29T02:40:28Z) - Neural Optimization with Adaptive Heuristics for Intelligent Marketing System [1.3079444139643954]
We propose a general framework for marketing AI systems, the Neural Optimization with Adaptive Heuristics (Noah) framework.
Noah is the first general framework for marketing optimization that considers both to-business (2B) and to-consumer (2C) products, as well as both owned and paid channels.
We describe key modules of the Noah framework, including prediction, optimization, and adaptive audiences, providing examples for bidding and content optimization.
arXiv Detail & Related papers (2024-05-17T01:44:30Z) - Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline
Pre-Training with Model Based Augmentation [59.899714450049494]
offline pre-training can produce sub-optimal policies and lead to degraded online reinforcement learning performance.
We propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective.
arXiv Detail & Related papers (2023-12-15T14:49:41Z) - ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles [52.34951901588738]
We propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL.
By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance.
Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods.
arXiv Detail & Related papers (2023-06-12T05:10:10Z) - Efficient and Effective Augmentation Strategy for Adversarial Training [48.735220353660324]
Adversarial training of Deep Neural Networks is known to be significantly more data-hungry than standard training.
We propose Diverse Augmentation-based Joint Adversarial Training (DAJAT) to use data augmentations effectively in adversarial training.
arXiv Detail & Related papers (2022-10-27T10:59:55Z) - MOORe: Model-based Offline-to-Online Reinforcement Learning [26.10368749930102]
We propose a model-based Offline-to-Online Reinforcement learning (MOORe) algorithm.
Experiment results show that our algorithm smoothly transfers from offline to online stages while enabling sample-efficient online adaption.
arXiv Detail & Related papers (2022-01-25T03:14:57Z) - Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential
Advertising [52.3825928886714]
We formulate the sequential advertising strategy optimization as a dynamic knapsack problem.
We propose a theoretically guaranteed bilevel optimization framework, which significantly reduces the solution space of the original optimization space.
To improve the exploration efficiency of reinforcement learning, we also devise an effective action space reduction approach.
arXiv Detail & Related papers (2020-06-29T18:50:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.