Training Microsoft News Recommenders with Pretrained Language Models in
the Loop
- URL: http://arxiv.org/abs/2102.09268v1
- Date: Thu, 18 Feb 2021 11:08:38 GMT
- Title: Training Microsoft News Recommenders with Pretrained Language Models in
the Loop
- Authors: Shitao Xiao, Zheng Liu, Yingxia Shao, Tao Di and Xing Xie
- Abstract summary: We propose a novel framework, SpeedyFeed, which efficiently trains PLMs-based news recommenders of superior quality.
SpeedyFeed is highlighted for its light-weighted encoding pipeline, which removes most of the repetitive but redundant encoding operations.
PLMs-based model significantly outperforms the state-of-the-art news recommenders in comprehensive offline experiments.
- Score: 22.96193782709208
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: News recommendation calls for deep insights of news articles' underlying
semantics. Therefore, pretrained language models (PLMs), like BERT and RoBERTa,
may substantially contribute to the recommendation quality. However, it's
extremely challenging to have news recommenders trained together with such big
models: the learning of news recommenders requires intensive news encoding
operations, whose cost is prohibitive if PLMs are used as the news encoder. In
this paper, we propose a novel framework, SpeedyFeed, which efficiently trains
PLMs-based news recommenders of superior quality. SpeedyFeed is highlighted for
its light-weighted encoding pipeline, which gives rise to three major
advantages. Firstly, it makes the intermedia results fully reusable for the
training workflow, which removes most of the repetitive but redundant encoding
operations. Secondly, it improves the data efficiency of the training workflow,
where non-informative data can be eliminated from encoding. Thirdly, it further
saves the cost by leveraging simplified news encoding and compact news
representation.
SpeedyFeed leads to more than 100$\times$ acceleration of the training
process, which enables big models to be trained efficiently and effectively
over massive user data. The well-trained PLMs-based model significantly
outperforms the state-of-the-art news recommenders in comprehensive offline
experiments. It is applied to Microsoft News to empower the training of
large-scale production models, which demonstrate highly competitive online
performances. SpeedyFeed is also a model-agnostic framework, thus being
potentially applicable to a wide spectrum of content-based recommender systems.
We've made the source code open to the public so as to facilitate research and
applications in related areas.
Related papers
- Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs [56.040198387038025]
We present a novel prompt-guided visual perception framework (abbreviated as Free Video-LLM) for efficient inference of training-free video LLMs.
Our method effectively reduces the number of visual tokens while maintaining high performance across multiple video question-answering benchmarks.
arXiv Detail & Related papers (2024-10-14T12:35:12Z) - Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes.
Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts.
We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z) - Efficient Multimodal Learning from Data-centric Perspective [21.35857180519653]
We introduce Bunny, a family of lightweight MLLMs with flexible vision and language backbones for efficient multimodal learning.
Experiments show that our Bunny-4B/8B outperforms the state-of-the-art large MLLMs on multiple benchmarks.
arXiv Detail & Related papers (2024-02-18T10:09:10Z) - Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries.
Experimental results show that our method improves consistently over existing methods.
Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z) - Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised
Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches.
We present UPET, a novel Uncertainty-aware self-Training framework.
We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z) - INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of
Language Models [40.54353850357839]
We show how we can employ submodular optimization to select highly representative subsets of the training corpora.
We show that the resulting models achieve up to $sim99%$ of the performance of the fully-trained models.
arXiv Detail & Related papers (2023-05-11T09:24:41Z) - Clinical Prompt Learning with Frozen Language Models [4.077071350659386]
Large but frozen pre-trained language models (PLMs) with prompt learning outperform smaller but fine-tuned models.
We investigated the viability of prompt learning on clinically meaningful decision tasks.
Results are partially in line with the prompt learning literature, with prompt learning able to match or improve on traditional fine-tuning.
arXiv Detail & Related papers (2022-05-11T14:25:13Z) - NewsBERT: Distilling Pre-trained Language Model for Intelligent News
Application [56.1830016521422]
We propose NewsBERT, which can distill pre-trained language models for efficient and effective news intelligence.
In our approach, we design a teacher-student joint learning and distillation framework to collaboratively learn both teacher and student models.
In our experiments, NewsBERT can effectively improve the model performance in various intelligent news applications with much smaller models.
arXiv Detail & Related papers (2021-02-09T15:41:12Z) - Teaching with Commentaries [108.62722733649542]
We propose a flexible teaching framework using commentaries and learned meta-information.
We find that commentaries can improve training speed and/or performance.
commentaries can be reused when training new models to obtain performance benefits.
arXiv Detail & Related papers (2020-11-05T18:52:46Z) - COLAM: Co-Learning of Deep Neural Networks and Soft Labels via
Alternating Minimization [60.07531696857743]
Co-Learns DNNs and soft labels through Alternating Minimization of two objectives.
We propose COLAM framework that Co-Learns DNNs and soft labels through Alternating Minimization of two objectives.
arXiv Detail & Related papers (2020-04-26T17:50:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.