Learning Large-scale Universal User Representation with Sparse Mixture
of Experts
- URL: http://arxiv.org/abs/2207.04648v1
- Date: Mon, 11 Jul 2022 06:19:03 GMT
- Title: Learning Large-scale Universal User Representation with Sparse Mixture
of Experts
- Authors: Caigao Jiang, Siqiao Xue, James Zhang, Lingyue Liu, Zhibo Zhu, Hongyan
Hao
- Abstract summary: We propose SUPERMOE, a generic framework to obtain high quality user representation from multiple tasks.
Specifically, the user behaviour sequences are encoded by MoE transformer, and we can thus increase the model capacity to billions of parameters.
In order to deal with seesaw phenomenon when learning across multiple tasks, we design a new loss function with task indicators.
- Score: 1.2722697496405464
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning user sequence behaviour embedding is very sophisticated and
challenging due to the complicated feature interactions over time and high
dimensions of user features. Recent emerging foundation models, e.g., BERT and
its variants, encourage a large body of researchers to investigate in this
field. However, unlike natural language processing (NLP) tasks, the parameters
of user behaviour model come mostly from user embedding layer, which makes most
existing works fail in training a universal user embedding of large scale.
Furthermore, user representations are learned from multiple downstream tasks,
and the past research work do not address the seesaw phenomenon. In this paper,
we propose SUPERMOE, a generic framework to obtain high quality user
representation from multiple tasks. Specifically, the user behaviour sequences
are encoded by MoE transformer, and we can thus increase the model capacity to
billions of parameters, or even to trillions of parameters. In order to deal
with seesaw phenomenon when learning across multiple tasks, we design a new
loss function with task indicators. We perform extensive offline experiments on
public datasets and online experiments on private real-world business
scenarios. Our approach achieves the best performance over state-of-the-art
models, and the results demonstrate the effectiveness of our framework.
Related papers
- DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning [75.68193159293425]
In-context learning (ICL) allows transformer-based language models to learn a specific task with a few "task demonstrations" without updating their parameters.
We propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL.
We experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.
arXiv Detail & Related papers (2024-05-22T15:52:52Z) - Task Relation-aware Continual User Representation Learning [26.514449669395297]
Previous efforts in user modeling mainly focus on learning a task-specific user representation that is designed for a single task.
Recent studies introduce the concept of universal user representation, which is a more generalized representation of a user relevant to a variety of tasks.
Despite their effectiveness, existing approaches for learning universal user representations are impractical in real-world applications.
We propose a novel continual user representation learning method, called TERACON, whose learning capability is not limited as the number of learned tasks increases.
arXiv Detail & Related papers (2023-06-01T08:10:03Z) - Prototype-guided Cross-task Knowledge Distillation for Large-scale
Models [103.04711721343278]
Cross-task knowledge distillation helps to train a small student model to obtain a competitive performance.
We propose a Prototype-guided Cross-task Knowledge Distillation (ProC-KD) approach to transfer the intrinsic local-level object knowledge of a large-scale teacher network to various task scenarios.
arXiv Detail & Related papers (2022-12-26T15:00:42Z) - Sequential Search with Off-Policy Reinforcement Learning [48.88165680363482]
We propose a highly scalable hybrid learning model that consists of an RNN learning framework and an attention model.
As a novel optimization step, we fit multiple short user sequences in a single RNN pass within a training batch, by solving a greedy knapsack problem on the fly.
We also explore the use of off-policy reinforcement learning in multi-session personalized search ranking.
arXiv Detail & Related papers (2022-02-01T06:52:40Z) - Scaling Law for Recommendation Models: Towards General-purpose User
Representations [3.3073775218038883]
We explore the possibility of general-purpose user representation learning by training a universal user encoder at large scales.
We show that the scaling law holds in the user modeling areas, where the training error scales as a power-law with the amount of compute.
We also investigate how the performance changes according to the scale-up factors, i.e., model capacity, sequence length and batch size.
arXiv Detail & Related papers (2021-11-15T10:39:29Z) - Empowering General-purpose User Representation with Full-life Cycle
Behavior Modeling [11.698166058448555]
We propose a novel framework called full- Life cycle User Representation Model (LURM) to tackle this challenge.
LURM consists of two cascaded sub-models: (I) Bag-of-Interests (BoI) encodes user behaviors in any time period into a sparse vector with super-high dimension (e.g., 105)
SMEN achieves almost dimensionality reduction, benefiting from a novel multi-anchor module which can learn different aspects of user interests.
arXiv Detail & Related papers (2021-10-20T08:24:44Z) - Exploiting Behavioral Consistence for Universal User Representation [11.290137806288191]
We focus on developing universal user representation model.
The obtained universal representations are expected to contain rich information.
We propose Self-supervised User Modeling Network (SUMN) to encode behavior data into the universal representation.
arXiv Detail & Related papers (2020-12-11T06:10:14Z) - Learning Transferrable Parameters for Long-tailed Sequential User
Behavior Modeling [70.64257515361972]
We argue that focusing on tail users could bring more benefits and address the long tails issue.
Specifically, we propose a gradient alignment and adopt an adversarial training scheme to facilitate knowledge transfer from the head to the tail.
arXiv Detail & Related papers (2020-10-22T03:12:02Z) - Task-Feature Collaborative Learning with Application to Personalized
Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL)
Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks.
As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z) - Meta Adaptation using Importance Weighted Demonstrations [19.37671674146514]
In some cases, the distribution shifts, so much, that it is difficult for an agent to infer the new task.
We propose a novel algorithm to generalize on any related task by leveraging prior knowledge on a set of specific tasks.
We show experiments where the robot is trained from a diversity of environmental tasks and is also able to adapt to an unseen environment.
arXiv Detail & Related papers (2019-11-23T07:22:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.