Related papers: Training Microsoft News Recommenders with Pretrained Language Models in the Loop

Training Microsoft News Recommenders with Pretrained Language Models in the Loop

URL: http://arxiv.org/abs/2102.09268v1
Date: Thu, 18 Feb 2021 11:08:38 GMT
Title: Training Microsoft News Recommenders with Pretrained Language Models in the Loop
Authors: Shitao Xiao, Zheng Liu, Yingxia Shao, Tao Di and Xing Xie
Abstract summary: We propose a novel framework, SpeedyFeed, which efficiently trains PLMs-based news recommenders of superior quality. SpeedyFeed is highlighted for its light-weighted encoding pipeline, which removes most of the repetitive but redundant encoding operations. PLMs-based model significantly outperforms the state-of-the-art news recommenders in comprehensive offline experiments.
Score: 22.96193782709208
License: http://creativecommons.org/licenses/by/4.0/
Abstract: News recommendation calls for deep insights of news articles' underlying semantics. Therefore, pretrained language models (PLMs), like BERT and RoBERTa, may substantially contribute to the recommendation quality. However, it's extremely challenging to have news recommenders trained together with such big models: the learning of news recommenders requires intensive news encoding operations, whose cost is prohibitive if PLMs are used as the news encoder. In this paper, we propose a novel framework, SpeedyFeed, which efficiently trains PLMs-based news recommenders of superior quality. SpeedyFeed is highlighted for its light-weighted encoding pipeline, which gives rise to three major advantages. Firstly, it makes the intermedia results fully reusable for the training workflow, which removes most of the repetitive but redundant encoding operations. Secondly, it improves the data efficiency of the training workflow, where non-informative data can be eliminated from encoding. Thirdly, it further saves the cost by leveraging simplified news encoding and compact news representation. SpeedyFeed leads to more than 100$\times$ acceleration of the training process, which enables big models to be trained efficiently and effectively over massive user data. The well-trained PLMs-based model significantly outperforms the state-of-the-art news recommenders in comprehensive offline experiments. It is applied to Microsoft News to empower the training of large-scale production models, which demonstrate highly competitive online performances. SpeedyFeed is also a model-agnostic framework, thus being potentially applicable to a wide spectrum of content-based recommender systems. We've made the source code open to the public so as to facilitate research and applications in related areas.

Related papers

Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs [56.040198387038025]
We present a novel prompt-guided visual perception framework (abbreviated as Free Video-LLM) for efficient inference of training-free video LLMs. Our method effectively reduces the number of visual tokens while maintaining high performance across multiple video question-answering benchmarks.
arXiv Detail & Related papers (2024-10-14T12:35:12Z)
Accelerating Large Language Model Pretraining via LFR Pedagogy: Learn, Focus, and Review [50.78587571704713]
Learn-Focus-Review (LFR) is a dynamic training approach that adapts to the model's learning progress. LFR tracks the model's learning performance across data blocks (sequences of tokens) and prioritizes revisiting challenging regions of the dataset. Compared to baseline models trained on the full datasets, LFR consistently achieved lower perplexity and higher accuracy.
arXiv Detail & Related papers (2024-09-10T00:59:18Z)
Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes. Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts. We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z)
Efficient Multimodal Learning from Data-centric Perspective [21.35857180519653]
We introduce Bunny, a family of lightweight MLLMs with flexible vision and language backbones for efficient multimodal learning. Experiments show that our Bunny-4B/8B outperforms the state-of-the-art large MLLMs on multiple benchmarks.
arXiv Detail & Related papers (2024-02-18T10:09:10Z)
Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries. Experimental results show that our method improves consistently over existing methods. Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z)
Uncertainty-aware Parameter-Efficient Self-training for Semi-supervised Language Understanding [38.11411155621616]
We study self-training as one of the predominant semi-supervised learning approaches. We present UPET, a novel Uncertainty-aware self-Training framework. We show that UPET achieves a substantial improvement in terms of performance and efficiency.
arXiv Detail & Related papers (2023-10-19T02:18:29Z)
INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of Language Models [40.54353850357839]
We show how we can employ submodular optimization to select highly representative subsets of the training corpora. We show that the resulting models achieve up to $sim99%$ of the performance of the fully-trained models.
arXiv Detail & Related papers (2023-05-11T09:24:41Z)
Clinical Prompt Learning with Frozen Language Models [4.077071350659386]
Large but frozen pre-trained language models (PLMs) with prompt learning outperform smaller but fine-tuned models. We investigated the viability of prompt learning on clinically meaningful decision tasks. Results are partially in line with the prompt learning literature, with prompt learning able to match or improve on traditional fine-tuning.
arXiv Detail & Related papers (2022-05-11T14:25:13Z)
NewsBERT: Distilling Pre-trained Language Model for Intelligent News Application [56.1830016521422]
We propose NewsBERT, which can distill pre-trained language models for efficient and effective news intelligence. In our approach, we design a teacher-student joint learning and distillation framework to collaboratively learn both teacher and student models. In our experiments, NewsBERT can effectively improve the model performance in various intelligent news applications with much smaller models.
arXiv Detail & Related papers (2021-02-09T15:41:12Z)
Teaching with Commentaries [108.62722733649542]
We propose a flexible teaching framework using commentaries and learned meta-information. We find that commentaries can improve training speed and/or performance. commentaries can be reused when training new models to obtain performance benefits.
arXiv Detail & Related papers (2020-11-05T18:52:46Z)
COLAM: Co-Learning of Deep Neural Networks and Soft Labels via Alternating Minimization [60.07531696857743]
Co-Learns DNNs and soft labels through Alternating Minimization of two objectives. We propose COLAM framework that Co-Learns DNNs and soft labels through Alternating Minimization of two objectives.
arXiv Detail & Related papers (2020-04-26T17:50:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.