Learning List-wise Representation in Reinforcement Learning for Ads
Allocation with Multiple Auxiliary Tasks
- URL: http://arxiv.org/abs/2204.00888v1
- Date: Sat, 2 Apr 2022 15:53:37 GMT
- Title: Learning List-wise Representation in Reinforcement Learning for Ads
Allocation with Multiple Auxiliary Tasks
- Authors: Guogang Liao, Ze Wang, Xiaowen Shi, Xiaoxu Wu, Chuheng Zhang, Yongkang
Wang, Xingxing Wang, Dong Wang
- Abstract summary: We propose a novel algorithm to learn a better representation by leveraging task-specific signals on Meituan food delivery platform.
Specifically, we propose three different types of auxiliary tasks that are based on reconstruction, prediction, and contrastive learning respectively.
- Score: 14.9065245548275
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the recent prevalence of reinforcement learning (RL), there have been
tremendous interests in utilizing RL for ads allocation in recommendation
platforms (e.g., e-commerce and news feed sites). For better performance,
recent RL-based ads allocation agent makes decisions based on representations
of list-wise item arrangement. This results in a high-dimensional state-action
space, which makes it difficult to learn an efficient and generalizable
list-wise representation. To address this problem, we propose a novel algorithm
to learn a better representation by leveraging task-specific signals on Meituan
food delivery platform. Specifically, we propose three different types of
auxiliary tasks that are based on reconstruction, prediction, and contrastive
learning respectively. We conduct extensive offline experiments on the
effectiveness of these auxiliary tasks and test our method on real-world food
delivery platform. The experimental results show that our method can learn
better list-wise representations and achieve higher revenue for the platform.
Related papers
- Offline Multitask Representation Learning for Reinforcement Learning [86.26066704016056]
We study offline multitask representation learning in reinforcement learning (RL)
We propose a new algorithm called MORL for offline multitask representation learning.
Our theoretical results demonstrate the benefits of using the learned representation from the upstream offline task instead of directly learning the representation of the low-rank model.
arXiv Detail & Related papers (2024-03-18T08:50:30Z) - Accelerating exploration and representation learning with offline
pre-training [52.6912479800592]
We show that exploration and representation learning can be improved by separately learning two different models from a single offline dataset.
We show that learning a state representation using noise-contrastive estimation and a model of auxiliary reward can significantly improve the sample efficiency on the challenging NetHack benchmark.
arXiv Detail & Related papers (2023-03-31T18:03:30Z) - e-CLIP: Large-Scale Vision-Language Representation Learning in
E-commerce [9.46186546774799]
We propose a contrastive learning framework that aligns language and visual models using unlabeled raw product text and images.
We present techniques we used to train large-scale representation learning models and share solutions that address domain-specific challenges.
arXiv Detail & Related papers (2022-07-01T05:16:47Z) - Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable.
We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z) - Towards Universal Sequence Representation Learning for Recommender
Systems [98.02154164251846]
We present a novel universal sequence representation learning approach, named UniSRec.
The proposed approach utilizes the associated description text of items to learn transferable representations across different recommendation scenarios.
Our approach can be effectively transferred to new recommendation domains or platforms in a parameter-efficient way.
arXiv Detail & Related papers (2022-06-13T07:21:56Z) - Contrastive Learning from Demonstrations [0.0]
We show that these representations are applicable for imitating several robotic tasks, including pick and place.
We optimize a recently proposed self-supervised learning algorithm by applying contrastive learning to enhance task-relevant information.
arXiv Detail & Related papers (2022-01-30T13:36:07Z) - Learning Temporally-Consistent Representations for Data-Efficient
Reinforcement Learning [3.308743964406687]
$k$-Step Latent (KSL) is a representation learning method that enforces temporal consistency of representations.
KSL produces encoders that generalize better to new tasks unseen during training.
arXiv Detail & Related papers (2021-10-11T00:16:43Z) - Techniques Toward Optimizing Viewability in RTB Ad Campaigns Using
Reinforcement Learning [0.0]
Reinforcement learning (RL) is an effective technique for training decision-making agents through interactions with their environment.
In digital advertising, real-time bidding (RTB) is a common method of allocating advertising inventory through real-time auctions.
arXiv Detail & Related papers (2021-05-21T21:56:12Z) - Reinforcement Learning with Prototypical Representations [114.35801511501639]
Proto-RL is a self-supervised framework that ties representation learning with exploration through prototypical representations.
These prototypes simultaneously serve as a summarization of the exploratory experience of an agent as well as a basis for representing observations.
This enables state-of-the-art downstream policy learning on a set of difficult continuous control tasks.
arXiv Detail & Related papers (2021-02-22T18:56:34Z) - Self-supervised Learning for Large-scale Item Recommendations [18.19202958502061]
Large scale recommender models find most relevant items from huge catalogs.
With millions to billions of items in the corpus, users tend to provide feedback for a very small set of them.
We propose a multi-task self-supervised learning framework for large-scale item recommendations.
arXiv Detail & Related papers (2020-07-25T06:21:43Z) - Privileged Information Dropout in Reinforcement Learning [56.82218103971113]
Using privileged information during training can improve the sample efficiency and performance of machine learning systems.
In this work, we investigate Privileged Information Dropout (pid) for achieving the latter which can be applied equally to value-based and policy-based reinforcement learning algorithms.
arXiv Detail & Related papers (2020-05-19T05:32:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.