Related papers: PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform

PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform

URL: http://arxiv.org/abs/2507.12704v3
Date: Wed, 20 Aug 2025 23:15:51 GMT
Title: PinFM: Foundation Model for User Activity Sequences at a Billion-scale Visual Discovery Platform
Authors: Xiangyi Chen, Kousik Rajesh, Matthew Lawhon, Zelun Wang, Hanyu Li, Haomiao Li, Saurabh Vishwas Joshi, Pong Eksombatchai, Jaewon Yang, Yi-Ping Hsu, Jiajing Xu, Charles Rosenberg,
Abstract summary: We present a foundational model, PinFM, for understanding user activity sequences across multiple applications at a billion-scale visual discovery platform.<n>We pretrain a transformer model with 20B+ parameters using extensive user activity data, then fine-tune it for specific applications.<n>Our infrastructure and algorithmic optimizations, such as the Deduplicated Cross-Attention Transformer (DCAT), improved our throughput by 600% on Pinterest.
Score: 9.628316811614566
License: http://creativecommons.org/licenses/by/4.0/
Abstract: User activity sequences have emerged as one of the most important signals in recommender systems. We present a foundational model, PinFM, for understanding user activity sequences across multiple applications at a billion-scale visual discovery platform. We pretrain a transformer model with 20B+ parameters using extensive user activity data, then fine-tune it for specific applications, efficiently coupling it with existing models. While this pretraining-and-fine-tuning approach has been popular in other domains, such as Vision and NLP, its application in industrial recommender systems presents numerous challenges. The foundational model must be scalable enough to score millions of items every second while meeting tight cost and latency constraints imposed by these systems. Additionally, it should capture the interactions between user activities and other features and handle new items that were not present during the pretraining stage. We developed innovative techniques to address these challenges. Our infrastructure and algorithmic optimizations, such as the Deduplicated Cross-Attention Transformer (DCAT), improved our throughput by 600% on Pinterest internal data. We demonstrate that PinFM can learn interactions between user sequences and candidate items by altering input sequences, leading to a 20% increase in engagement with new items. PinFM is now deployed to help improve the experience of more than half a billion users across various applications.

Related papers

Multimodal Generative Retrieval Model with Staged Pretraining for Food Delivery on Meituan [30.893121144130664]
Multimodal retrieval models are increasingly important in scenarios such as food delivery.<n>We propose a staged pretraining strategy, which guides the model to focus on specialized tasks at each stage.<n>To better utilize the semantic IDs that compress high-dimensional multimodal embeddings, we design both generative and discriminative tasks.
arXiv Detail & Related papers (2026-02-06T12:29:13Z)
Massive Memorization with Hundreds of Trillions of Parameters for Sequential Transducer Generative Recommenders [11.073761978382398]
We propose a novel two-stage modeling framework, namely VIrtual Sequential Target Attention (VISTA)<n>VISTA decomposes traditional target attention from a candidate item to user history items into two distinct stages.<n>Our approach achieves significant improvements in offline and online metrics and has been successfully deployed on an industry leading recommendation platform.
arXiv Detail & Related papers (2025-10-24T22:17:49Z)
Modeling Long-term User Behaviors with Diffusion-driven Multi-interest Network for CTR Prediction [18.302602011055775]
We propose DiffuMIN (Diffusion-driven Multi-Interest Network) to model long-term user behaviors.<n>We show that DiffuMIN increased CTR by 1.52% and CPM by 1.10% in online A/B testing.
arXiv Detail & Related papers (2025-08-21T07:10:01Z)
FuXi-$α$: Scaling Recommendation Model with Feature Interaction Enhanced Transformer [81.12174905444229]
Recent advancements have shown that expanding sequential recommendation models to large-scale recommendation models can be an effective strategy.<n>We propose a new model called FuXi-$alpha$ to address these issues.<n>Our model outperforms existing models, with its performance continuously improving as the model size increases.
arXiv Detail & Related papers (2025-02-05T09:46:54Z)
Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models [64.28420991770382]
Data-Juicer 2.0 is a data processing system backed by data processing operators spanning text, image, video, and audio modalities.<n>It supports more critical tasks including data analysis, annotation, and foundation model post-training.<n>It has been widely adopted in diverse research fields and real-world products such as Alibaba Cloud PAI.
arXiv Detail & Related papers (2024-12-23T08:29:57Z)
Multi-granularity Interest Retrieval and Refinement Network for Long-Term User Behavior Modeling in CTR Prediction [68.90783662117936]
Click-through Rate (CTR) prediction is crucial for online personalization platforms.<n>Recent advancements have shown that modeling rich user behaviors can significantly improve the performance of CTR prediction.<n>We propose Multi-granularity Interest Retrieval and Refinement Network (MIRRN)
arXiv Detail & Related papers (2024-11-22T15:29:05Z)
Retrieval Augmentation via User Interest Clustering [57.63883506013693]
Industrial recommender systems are sensitive to the patterns of user-item engagement. We propose a novel approach that efficiently constructs user interest and facilitates low computational cost inference. Our approach has been deployed in multiple products at Meta, facilitating short-form video related recommendation.
arXiv Detail & Related papers (2024-08-07T16:35:10Z)
LiMAML: Personalization of Deep Recommender Models via Meta Learning [13.69036196446634]
We introduce an innovative meta-learning solution tailored to the personalization of models for individual members and other entities. We leverage the Model-Agnostic Meta Learning (MAML) algorithm to adapt per-task sub-networks using recent user interaction data. Our approach has enabled the deployment of a range of highly personalized AI models across diverse LinkedIn applications.
arXiv Detail & Related papers (2024-02-23T22:06:36Z)
MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation [61.45986275328629]
We propose MISSRec, a multi-modal pre-training and transfer learning framework for sequential recommendation. On the user side, we design a Transformer-based encoder-decoder model, where the contextual encoder learns to capture the sequence-level multi-modal user interests. On the candidate item side, we adopt a dynamic fusion module to produce user-adaptive item representation.
arXiv Detail & Related papers (2023-08-22T04:06:56Z)
CAM2: Conformity-Aware Multi-Task Ranking Model for Large-Scale Recommender Systems [0.0]
We introduce CAM2, a conformity-aware multi-task ranking model to serve relevant items to users on one of the largest industrial recommendation platforms. We show through online experiments that the CAM2 model results in a significant 0.50% increase in aggregated user engagement.
arXiv Detail & Related papers (2023-04-17T19:00:55Z)
Multi-Behavior Sequential Recommendation with Temporal Graph Transformer [66.10169268762014]
We tackle the dynamic user-item relation learning with the awareness of multi-behavior interactive patterns. We propose a new Temporal Graph Transformer (TGT) recommendation framework to jointly capture dynamic short-term and long-range user-item interactive patterns.
arXiv Detail & Related papers (2022-06-06T15:42:54Z)
Multimodal Personality Recognition using Cross-Attention Transformer and Behaviour Encoding [0.0]
We propose a flexible model for the task which exploits all available data. The task involves complex relations and to avoid using a large model for video processing specifically, we propose the use of behaviour encoding.
arXiv Detail & Related papers (2021-12-22T19:14:55Z)
Search-based User Interest Modeling with Lifelong Sequential Behavior Data for Click-Through Rate Prediction [23.460147230576855]
We propose a new modeling paradigm, which we name as Search-based Interest Model (SIM) SIM extracts user interests with two cascaded search units. Since 2019, SIM has been deployed in the display advertising system in Alibaba, bringing 7.1% CTR and 4.4% lift.
arXiv Detail & Related papers (2020-06-10T03:41:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.