Scaling Law for Recommendation Models: Towards General-purpose User
Representations
- URL: http://arxiv.org/abs/2111.11294v1
- Date: Mon, 15 Nov 2021 10:39:29 GMT
- Title: Scaling Law for Recommendation Models: Towards General-purpose User
Representations
- Authors: Kyuyong Shin, Hanock Kwak, Kyung-Min Kim, Su Young Kim, Max Nihlen
Ramstrom
- Abstract summary: We explore the possibility of general-purpose user representation learning by training a universal user encoder at large scales.
We show that the scaling law holds in the user modeling areas, where the training error scales as a power-law with the amount of compute.
We also investigate how the performance changes according to the scale-up factors, i.e., model capacity, sequence length and batch size.
- Score: 3.3073775218038883
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A recent trend shows that a general class of models, e.g., BERT, GPT-3, CLIP,
trained on broad data at scale have shown a great variety of functionalities
with a single learning architecture. In this work, we explore the possibility
of general-purpose user representation learning by training a universal user
encoder at large scales. We demonstrate that the scaling law holds in the user
modeling areas, where the training error scales as a power-law with the amount
of compute. Our Contrastive Learning User Encoder (CLUE), optimizes
task-agnostic objectives, and the resulting user embeddings stretches our
expectation of what is possible to do in various downstream tasks. CLUE also
shows great transferability to other domains and systems, as performances on an
online experiment shows significant improvements in online Click-Through-Rate
(CTR). Furthermore, we also investigate how the performance changes according
to the scale-up factors, i.e., model capacity, sequence length and batch size.
Related papers
- Scaling Sequential Recommendation Models with Transformers [0.0]
We take inspiration from the scaling laws observed in training large language models, and explore similar principles for sequential recommendation.
Compute-optimal training is possible but requires a careful analysis of the compute-performance trade-offs specific to the application.
We also show that performance scaling translates to downstream tasks by fine-tuning larger pre-trained models on smaller task-specific domains.
arXiv Detail & Related papers (2024-12-10T15:20:56Z) - DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning [75.68193159293425]
In-context learning (ICL) allows transformer-based language models to learn a specific task with a few "task demonstrations" without updating their parameters.
We propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL.
We experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.
arXiv Detail & Related papers (2024-05-22T15:52:52Z) - Observational Scaling Laws and the Predictability of Language Model Performance [51.2336010244645]
We propose an observational approach that bypasses model training and instead builds scaling laws from 100 publically available models.
We show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models.
We show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
arXiv Detail & Related papers (2024-05-17T17:49:44Z) - Mixtures of Experts Unlock Parameter Scaling for Deep RL [54.26191237981469]
In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules into value-based networks results in more parameter-scalable models.
This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
arXiv Detail & Related papers (2024-02-13T17:18:56Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Reproducible scaling laws for contrastive language-image learning [42.354402731615444]
We investigate scaling laws for contrastive language-image pre-training (CLIP) with the public LAION dataset and the open-source OpenCLIP repository.
Our large-scale experiments involve models trained on up to two billion image-text pairs and identify power law scaling for multiple downstream tasks.
We find that the training distribution plays a key role in scaling laws as the OpenAI and OpenCLIP models exhibit different scaling behavior despite identical model architectures.
arXiv Detail & Related papers (2022-12-14T10:24:50Z) - Leveraging Angular Information Between Feature and Classifier for
Long-tailed Learning: A Prediction Reformulation Approach [90.77858044524544]
We reformulate the recognition probabilities through included angles without re-balancing the classifier weights.
Inspired by the performance improvement of the predictive form reformulation, we explore the different properties of this angular prediction.
Our method is able to obtain the best performance among peer methods without pretraining on CIFAR10/100-LT and ImageNet-LT.
arXiv Detail & Related papers (2022-12-03T07:52:48Z) - Learning Large-scale Universal User Representation with Sparse Mixture
of Experts [1.2722697496405464]
We propose SUPERMOE, a generic framework to obtain high quality user representation from multiple tasks.
Specifically, the user behaviour sequences are encoded by MoE transformer, and we can thus increase the model capacity to billions of parameters.
In order to deal with seesaw phenomenon when learning across multiple tasks, we design a new loss function with task indicators.
arXiv Detail & Related papers (2022-07-11T06:19:03Z) - Calibrating Class Activation Maps for Long-Tailed Visual Recognition [60.77124328049557]
We present two effective modifications of CNNs to improve network learning from long-tailed distribution.
First, we present a Class Activation Map (CAMC) module to improve the learning and prediction of network classifiers.
Second, we investigate the use of normalized classifiers for representation learning in long-tailed problems.
arXiv Detail & Related papers (2021-08-29T05:45:03Z) - Evaluating CLIP: Towards Characterization of Broader Capabilities and
Downstream Implications [8.15254368157658]
We analyze CLIP and highlight some of the challenges such models pose.
We find that CLIP can inherit biases found in prior computer vision systems.
These results add evidence to the growing body of work calling for a change in the notion of a 'better' model.
arXiv Detail & Related papers (2021-08-05T19:05:57Z) - Exploiting Behavioral Consistence for Universal User Representation [11.290137806288191]
We focus on developing universal user representation model.
The obtained universal representations are expected to contain rich information.
We propose Self-supervised User Modeling Network (SUMN) to encode behavior data into the universal representation.
arXiv Detail & Related papers (2020-12-11T06:10:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.