External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation
- URL: http://arxiv.org/abs/2502.17494v6
- Date: Wed, 23 Apr 2025 06:47:38 GMT
- Title: External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation
- Authors: Mingfu Liang, Xi Liu, Rong Jin, Boyang Liu, Qiuling Suo, Qinghai Zhou, Song Zhou, Laming Chen, Hua Zheng, Zhiyuan Li, Shali Jiang, Jiyan Yang, Xiaozhen Xia, Fan Yang, Yasmine Badr, Ellie Wen, Shuyu Xu, Hansey Chen, Zhengyu Zhang, Jade Nie, Chunzhi Yang, Zhichen Zeng, Weilin Zhang, Xingliang Huang, Qianru Li, Shiquan Wang, Evelyn Lyu, Wenjing Lu, Rui Zhang, Wenjun Wang, Jason Rudy, Mengyue Hang, Kai Wang, Yinbin Ma, Shuaiwen Wang, Sihan Zeng, Tongyi Tang, Xiaohan Wei, Longhao Jin, Jamey Zhang, Marcus Chen, Jiayi Xu, Angie Huang, Xihuan Zeng, Chi Zhang, Zhengli Zhao, Jared Yang, Qiang Jin, Xian Chen, Amit Anand Amlesahwaram, Lexi Song, Liang Luo, Yuchen Hao, Nan Xiao, Yavuz Yetim, Luoshang Pan, Gaoxiang Liu, Yuxi Hu, Yuzhen Huang, Jackie Xu, Rich Zhu, Xin Zhang, Yiqun Liu, Hang Yin, Yuxin Chen, Buyun Zhang, Xiaoyi Liu, Xingyuan Wang, Wenguang Mao, Zhijing Li, Zhehui Zhou, Feifan Gu, Qin Huang, Chonglin Sun, Nancy Yu, Shuo Gu, Shupin Mao, Benjamin Au, Jingzheng Qin, Peggy Yao, Jae-Woo Choi, Bin Gao, Ernest Wang, Lei Zhang, Wen-Yen Chen, Ted Lee, Jay Zha, Yi Meng, Alex Gong, Edison Gao, Alireza Vahdatpour, Yiping Han, Yantao Yao, Toshinari Kureha, Shuo Chang, Musharaf Sultan, John Bocharov, Sagar Chordia, Xiaorui Gan, Peng Sun, Rocky Liu, Bo Long, Wenlin Chen, Santanu Kolay, Huayu Li,
- Abstract summary: Ads recommendation is a prominent service of online advertising systems and has been actively studied.<n>Recent studies indicate that scaling-up and advanced design of the recommendation model can bring significant performance improvement.<n>However, with a larger model scale, such prior studies have a significantly increasing gap from industry as they often neglect two fundamental challenges in industrial-scale applications.
- Score: 58.194356020695906
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Ads recommendation is a prominent service of online advertising systems and has been actively studied. Recent studies indicate that scaling-up and advanced design of the recommendation model can bring significant performance improvement. However, with a larger model scale, such prior studies have a significantly increasing gap from industry as they often neglect two fundamental challenges in industrial-scale applications. First, training and inference budgets are restricted for the model to be served, exceeding which may incur latency and impair user experience. Second, large-volume data arrive in a streaming mode with data distributions dynamically shifting, as new users/ads join and existing users/ads leave the system. We propose the External Large Foundation Model (ExFM) framework to address the overlooked challenges. Specifically, we develop external distillation and a data augmentation system (DAS) to control the computational cost of training/inference while maintaining high performance. We design the teacher in a way like a foundation model (FM) that can serve multiple students as vertical models (VMs) to amortize its building cost. We propose Auxiliary Head and Student Adapter to mitigate the data distribution gap between FM and VMs caused by the streaming data issue. Comprehensive experiments on internal industrial-scale applications and public datasets demonstrate significant performance gain by ExFM.
Related papers
- A Unified Knowledge-Distillation and Semi-Supervised Learning Framework to Improve Industrial Ads Delivery Systems [19.0143243243314]
Industrial ads ranking systems conventionally rely on labeled impression data, which leads to challenges such as overfitting, slower incremental gain from model scaling, and biases due to discrepancies between training and serving data.<n>We propose a Unified framework for Knowledge-Distillation and Semi-supervised Learning (UK) for ads ranking, empowering the training of models on a significantly larger and more diverse datasets.
arXiv Detail & Related papers (2025-02-05T23:14:07Z) - AdaF^2M^2: Comprehensive Learning and Responsive Leveraging Features in Recommendation System [16.364341783911414]
We propose a model-agnostic framework AdaF2M2, short for Adaptive Feature Modeling with Feature Mask.<n>By arming base models with AdaF2M2, we conduct online A/B tests on multiple recommendation scenarios, obtaining +1.37% and +1.89% cumulative improvements on user active days and app duration respectively.
arXiv Detail & Related papers (2025-01-27T06:49:27Z) - Specialized Foundation Models Struggle to Beat Supervised Baselines [60.23386520331143]
We look at three modalities -- genomics, satellite imaging, and time series -- with multiple recent FMs and compare them to a standard supervised learning workflow.
We find that it is consistently possible to train simple supervised models that match or even outperform the latest foundation models.
arXiv Detail & Related papers (2024-11-05T04:10:59Z) - OReole-FM: successes and challenges toward billion-parameter foundation models for high-resolution satellite imagery [0.3926357402982764]
Scaling models to billions of parameters has been shown to yield unprecedented benefits including emergent abilities.
We pair high-performance computing resources including Frontier supercomputer, America's first exascale system, and high-resolution optical RS data to pretrain billion-scale FMs.
arXiv Detail & Related papers (2024-10-25T20:55:12Z) - Synergizing Foundation Models and Federated Learning: A Survey [23.416321895575507]
This paper discusses the potentials and challenges of synergizing Federated Learning (FL) and Foundation Models (FM)
FL is a collaborative learning paradigm that breaks the barrier of data availability from different participants.
It provides a promising solution to customize and adapt FMs to a wide range of domain-specific tasks using distributed datasets whilst preserving privacy.
arXiv Detail & Related papers (2024-06-18T17:58:09Z) - Pretraining Billion-scale Geospatial Foundational Models on Frontier [0.16492989697868893]
Foundation Models (FMs) are trained with internet-scale unlabeled data via self-supervised learning.
We investigate billion scale FMs and HPC training profiles for geospatial applications by pretraining on publicly available data.
Our larger 3B parameter size model achieves up to 30% improvement in top1 scene classification accuracy.
arXiv Detail & Related papers (2024-04-17T19:16:32Z) - Learning from models beyond fine-tuning [78.20895343699658]
Learn From Model (LFM) focuses on the research, modification, and design of foundation models (FM) based on the model interface.<n>The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing.<n>This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM.
arXiv Detail & Related papers (2023-10-12T10:20:36Z) - Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data.
However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations.
This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z) - Directed Acyclic Graph Factorization Machines for CTR Prediction via
Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation.
KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z) - VFed-SSD: Towards Practical Vertical Federated Advertising [53.08038962443853]
We propose a semi-supervised split distillation framework VFed-SSD to alleviate the two limitations.
Specifically, we develop a self-supervised task MatchedPair Detection (MPD) to exploit the vertically partitioned unlabeled data.
Our framework provides an efficient federation-enhanced solution for real-time display advertising with minimal deploying cost and significant performance lift.
arXiv Detail & Related papers (2022-05-31T17:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.