Related papers: Optimizing AD Pruning of Sponsored Search with Reinforcement Learning

Optimizing AD Pruning of Sponsored Search with Reinforcement Learning

URL: http://arxiv.org/abs/2008.02014v1
Date: Wed, 5 Aug 2020 09:19:10 GMT
Title: Optimizing AD Pruning of Sponsored Search with Reinforcement Learning
Authors: Yijiang Lian, Zhijie Chen, Xin Pei, Shuang Li, Yifei Wang, Yuefeng Qiu, Zhiheng Zhang, Zhipeng Tao, Liang Yuan, Hanju Guan, Kefeng Zhang, Zhigang Li, Xiaochun Liu
Abstract summary: Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. The problem we are going to address is: how to pick out the best $K$ items from $N$ candidates to maximize the system's revenue. We propose a novel model-free reinforcement learning approach to fixing this problem.
Score: 14.583308909225552
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Industrial sponsored search system (SSS) can be logically divided into three modules: keywords matching, ad retrieving, and ranking. During ad retrieving, the ad candidates grow exponentially. A query with high commercial value might retrieve a great deal of ad candidates such that the ranking module could not afford. Due to limited latency and computing resources, the candidates have to be pruned earlier. Suppose we set a pruning line to cut SSS into two parts: upstream and downstream. The problem we are going to address is: how to pick out the best $K$ items from $N$ candidates provided by the upstream to maximize the total system's revenue. Since the industrial downstream is very complicated and updated quickly, a crucial restriction in this problem is that the selection scheme should get adapted to the downstream. In this paper, we propose a novel model-free reinforcement learning approach to fixing this problem. Our approach considers downstream as a black-box environment, and the agent sequentially selects items and finally feeds into the downstream, where revenue would be estimated and used as a reward to improve the selection policy. To the best of our knowledge, this is first time to consider the system optimization from a downstream adaption view. It is also the first time to use reinforcement learning techniques to tackle this problem. The idea has been successfully realized in Baidu's sponsored search system, and online long time A/B test shows remarkable improvements on revenue.

Related papers

Procurement Auctions via Approximately Optimal Submodular Optimization [53.93943270902349]
We study procurement auctions, where an auctioneer seeks to acquire services from strategic sellers with private costs. Our goal is to design computationally efficient auctions that maximize the difference between the quality of the acquired services and the total cost of the sellers.
arXiv Detail & Related papers (2024-11-20T18:06:55Z)
Selling Joint Ads: A Regret Minimization Perspective [7.288063443108292]
Motivated by online retail, we consider the problem of selling one item (e.g., an ad slot) to two non-excludable buyers (say, a merchant and a brand) This problem captures, for example, situations where a merchant and a brand bid cooperatively in an auction to advertise a product, and both benefit from the ad being shown. A mechanism collects bids from the two and decides whether to allocate and which payments the two parties should make.
arXiv Detail & Related papers (2024-09-12T07:59:10Z)
MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu's Sponsored Search [27.752810150893552]
Mobius project aims to train the matching layer to consider CPM as an additional optimization objective besides the query-ad relevance. This paper will elaborate on how we adopt active learning to overcome the insufficiency of click history at the matching layer. We contribute the solutions to Mobius-V1 as the first version of our next generation query-ad matching system.
arXiv Detail & Related papers (2024-09-05T11:56:40Z)
Misalignment, Learning, and Ranking: Harnessing Users Limited Attention [16.74322664734553]
We study the design of online algorithms that obtain vanishing regret against optimal benchmarks. We show how standard algorithms for adversarial online linear optimization can be used to obtain a payoff-time algorithm with $O(sqrtT)$ regret.
arXiv Detail & Related papers (2024-02-21T18:52:20Z)
Incentive-Aware Recommender Systems in Two-Sided Markets [49.692453629365204]
We propose a novel recommender system that aligns with agents' incentives while achieving myopically optimal performance. Our framework models this incentive-aware system as a multi-agent bandit problem in two-sided markets. Both algorithms satisfy an ex-post fairness criterion, which protects agents from over-exploitation.
arXiv Detail & Related papers (2022-11-23T22:20:12Z)
Search and Score-Based Waterfall Auction Optimization [0.7734726150561088]
We learn a waterfall strategy from historical data by wisely searching in the space of possible waterfalls and selecting the one leading to the highest revenue. Our framework guarantees that the waterfall revenue improves between iterations until converging to a local optimum.
arXiv Detail & Related papers (2022-01-17T13:59:12Z)
Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget. We consider multi-armed bandits, linear bandits, and reinforcement learning problems. We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z)
A novel auction system for selecting advertisements in Real-Time bidding [68.8204255655161]
Real-Time Bidding is a new Internet advertising system that has become very popular in recent years. We propose an alternative betting system with a new approach that not only considers the economic aspect but also other relevant factors for the functioning of the advertising system.
arXiv Detail & Related papers (2020-10-22T18:36:41Z)
Exploration in two-stage recommender systems [79.50534282841618]
Two-stage recommender systems are widely adopted in industry due to their scalability and maintainability. A key challenge of this setup is that optimal performance of each stage in isolation does not imply optimal global performance. We propose a method of synchronising the exploration strategies between the ranker and the nominators.
arXiv Detail & Related papers (2020-09-01T16:52:51Z)
Dynamic Knapsack Optimization Towards Efficient Multi-Channel Sequential Advertising [52.3825928886714]
We formulate the sequential advertising strategy optimization as a dynamic knapsack problem. We propose a theoretically guaranteed bilevel optimization framework, which significantly reduces the solution space of the original optimization space. To improve the exploration efficiency of reinforcement learning, we also devise an effective action space reduction approach.
arXiv Detail & Related papers (2020-06-29T18:50:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.