MDDL: A Framework for Reinforcement Learning-based Position Allocation
in Multi-Channel Feed
- URL: http://arxiv.org/abs/2304.09087v1
- Date: Mon, 17 Apr 2023 07:25:58 GMT
- Title: MDDL: A Framework for Reinforcement Learning-based Position Allocation
in Multi-Channel Feed
- Authors: Xiaowen Shi, Ze Wang, Yuanying Cai, Xiaoxu Wu, Fan Yang, Guogang Liao,
Yongkang Wang, Xingxing Wang, Dong Wang
- Abstract summary: We propose a framework named Multi-Distribution Data Learning (MDDL) to address the challenge of effectively utilizing both strategy and random data for training RL models.
MDDL incorporates a novel imitation learning signal to mitigate overestimation problems in strategy data and maximizes the RL signal for random data to facilitate effective learning.
MDDL has been fully deployed on the Meituan food delivery platform and currently serves over 300 million users.
- Score: 14.8342816935259
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, the mainstream approach in position allocation system is to utilize
a reinforcement learning model to allocate appropriate locations for items in
various channels and then mix them into the feed. There are two types of data
employed to train reinforcement learning (RL) model for position allocation,
named strategy data and random data. Strategy data is collected from the
current online model, it suffers from an imbalanced distribution of
state-action pairs, resulting in severe overestimation problems during
training. On the other hand, random data offers a more uniform distribution of
state-action pairs, but is challenging to obtain in industrial scenarios as it
could negatively impact platform revenue and user experience due to random
exploration. As the two types of data have different distributions, designing
an effective strategy to leverage both types of data to enhance the efficacy of
the RL model training has become a highly challenging problem. In this study,
we propose a framework named Multi-Distribution Data Learning (MDDL) to address
the challenge of effectively utilizing both strategy and random data for
training RL models on mixed multi-distribution data. Specifically, MDDL
incorporates a novel imitation learning signal to mitigate overestimation
problems in strategy data and maximizes the RL signal for random data to
facilitate effective learning. In our experiments, we evaluated the proposed
MDDL framework in a real-world position allocation system and demonstrated its
superior performance compared to the previous baseline. MDDL has been fully
deployed on the Meituan food delivery platform and currently serves over 300
million users.
Related papers
- Leveraging Foundation Models for Multi-modal Federated Learning with Incomplete Modality [41.79433449873368]
We propose a novel multi-modal federated learning method, Federated Multi-modal contrastiVe training with Pre-trained completion (FedMVP)
FedMVP integrates the large-scale pre-trained models to enhance the federated training.
We demonstrate that the model achieves superior performance over two real-world image-text classification datasets.
arXiv Detail & Related papers (2024-06-16T19:18:06Z) - Secrets of RLHF in Large Language Models Part II: Reward Modeling [134.97964938009588]
We introduce a series of novel methods to mitigate the influence of incorrect and ambiguous preferences in the dataset.
We also introduce contrastive learning to enhance the ability of reward models to distinguish between chosen and rejected responses.
arXiv Detail & Related papers (2024-01-11T17:56:59Z) - Bridging Distributionally Robust Learning and Offline RL: An Approach to
Mitigate Distribution Shift and Partial Data Coverage [32.578787778183546]
offline reinforcement learning (RL) algorithms learn optimal polices using historical (offline) data.
One of the main challenges in offline RL is the distribution shift.
We propose two offline RL algorithms using the distributionally robust learning (DRL) framework.
arXiv Detail & Related papers (2023-10-27T19:19:30Z) - Universal Metric Learning with Parameter-Efficient Transfer Learning [40.85295050164728]
A common practice in metric learning is to train and test an embedding model for each dataset.
This dataset-specific approach fails to simulate real-world scenarios that involve multiple heterogeneous distributions of data.
We introduce a novel metric learning paradigm, called Universal Metric Learning (UML), which learns a unified metric capable of capturing relations across multiple data distributions.
arXiv Detail & Related papers (2023-09-16T10:34:01Z) - Integrating Local Real Data with Global Gradient Prototypes for
Classifier Re-Balancing in Federated Long-Tailed Learning [60.41501515192088]
Federated Learning (FL) has become a popular distributed learning paradigm that involves multiple clients training a global model collaboratively.
The data samples usually follow a long-tailed distribution in the real world, and FL on the decentralized and long-tailed data yields a poorly-behaved global model.
In this work, we integrate the local real data with the global gradient prototypes to form the local balanced datasets.
arXiv Detail & Related papers (2023-01-25T03:18:10Z) - FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for
Non-IID Data in Federated Learning [4.02923738318937]
Uneven distribution of local data across different edge devices (clients) results in slow model training and accuracy reduction in federated learning.
This work introduces a novel non-IID type encountered in real-world datasets, namely cluster-skew.
We propose FedDRL, a novel FL model that employs deep reinforcement learning to adaptively determine each client's impact factor.
arXiv Detail & Related papers (2022-08-04T04:24:16Z) - FedDM: Iterative Distribution Matching for Communication-Efficient
Federated Learning [87.08902493524556]
Federated learning(FL) has recently attracted increasing attention from academia and industry.
We propose FedDM to build the global training objective from multiple local surrogate functions.
In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data.
arXiv Detail & Related papers (2022-07-20T04:55:18Z) - An Empirical Study on Distribution Shift Robustness From the Perspective
of Pre-Training and Data Augmentation [91.62129090006745]
This paper studies the distribution shift problem from the perspective of pre-training and data augmentation.
We provide the first comprehensive empirical study focusing on pre-training and data augmentation.
arXiv Detail & Related papers (2022-05-25T13:04:53Z) - Latent-Variable Advantage-Weighted Policy Optimization for Offline RL [70.01851346635637]
offline reinforcement learning methods hold the promise of learning policies from pre-collected datasets without the need to query the environment for new transitions.
In practice, offline datasets are often heterogeneous, i.e., collected in a variety of scenarios.
We propose to leverage latent-variable policies that can represent a broader class of policy distributions.
Our method improves the average performance of the next best-performing offline reinforcement learning methods by 49% on heterogeneous datasets.
arXiv Detail & Related papers (2022-03-16T21:17:03Z) - Federated Visual Classification with Real-World Data Distribution [9.564468846277366]
We characterize the effect real-world data distributions have on distributed learning, using as a benchmark the standard Federated Averaging (FedAvg) algorithm.
We introduce two new large-scale datasets for species and landmark classification, with realistic per-user data splits.
We also develop two new algorithms (FedVC, FedIR) that intelligently resample and reweight over the client pool, bringing large improvements in accuracy and stability in training.
arXiv Detail & Related papers (2020-03-18T07:55:49Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.