Related papers: PI2I: A Personalized Item-Based Collaborative Filtering Retrieval Framework

PI2I: A Personalized Item-Based Collaborative Filtering Retrieval Framework

URL: http://arxiv.org/abs/2601.16815v1
Date: Fri, 23 Jan 2026 15:10:39 GMT
Title: PI2I: A Personalized Item-Based Collaborative Filtering Retrieval Framework
Authors: Shaoqing Wang, Yingcai Ma, Kairui Fu, Ziyang Wang, Dunxian Huang, Yuliang Yan, Jian Wu,
Abstract summary: We propose a novel two-stage retrieval framework that enhances the personalization capabilities of item-to-item collaborative filtering (CF)<n>In the first Indexer Building Stage (IBS), we optimize the retrieval pool by relaxing truncation thresholds to maximize Hit Rate.<n>In the second Personalized Retrieval Stage (PRS), we introduce an interactive scoring model to overcome the limitations of inner product calculations.<n> offline experiments on large-scale real-world datasets demonstrate that PI2I outperforms traditional CF methods and rivals Two-Tower models.
Score: 15.34118278015945
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Efficiently selecting relevant content from vast candidate pools is a critical challenge in modern recommender systems. Traditional methods, such as item-to-item collaborative filtering (CF) and two-tower models, often fall short in capturing the complex user-item interactions due to uniform truncation strategies and overdue user-item crossing. To address these limitations, we propose Personalized Item-to-Item (PI2I), a novel two-stage retrieval framework that enhances the personalization capabilities of CF. In the first Indexer Building Stage (IBS), we optimize the retrieval pool by relaxing truncation thresholds to maximize Hit Rate, thereby temporarily retaining more items users might be interested in. In the second Personalized Retrieval Stage (PRS), we introduce an interactive scoring model to overcome the limitations of inner product calculations, allowing for richer modeling of intricate user-item interactions. Additionally, we construct negative samples based on the trigger-target (item-to-item) relationship, ensuring consistency between offline training and online inference. Offline experiments on large-scale real-world datasets demonstrate that PI2I outperforms traditional CF methods and rivals Two-Tower models. Deployed in the "Guess You Like" section on Taobao, PI2I achieved a 1.05% increase in online transaction rates. In addition, we have released a large-scale recommendation dataset collected from Taobao, containing 130 million real-world user interactions used in the experiments of this paper. The dataset is publicly available at https://huggingface.co/datasets/PI2I/PI2I, which could serve as a valuable benchmark for the research community.

Related papers

Improving E-commerce Search with Category-Aligned Retrieval [0.0]
Category-Aligned Retrieval System (CARS) improves search relevance by first predicting the product category from a user's query and then boosting products within that category.<n>We introduce a novel method for creating "Trainable Category Prototypes" from query embeddings.
arXiv Detail & Related papers (2025-09-03T20:43:52Z)
Leveraging Generative Models for Real-Time Query-Driven Text Summarization in Large-Scale Web Search [54.987957691350665]
Query-Driven Text Summarization (QDTS) aims to generate concise and informative summaries from textual documents based on a given query.<n>Traditional extractive summarization models, based primarily on ranking candidate summary segments, have been the dominant approach in industrial applications.<n>We propose a novel framework to pioneer the application of generative models to address real-time QDTS in industrial web search.
arXiv Detail & Related papers (2025-08-28T08:51:51Z)
ENCODE: Breaking the Trade-Off Between Performance and Efficiency in Long-Term User Behavior Modeling [12.963611514800656]
We propose an efficient two-stage long-term sequence modeling approach, named as EfficieNt Clustering based twO-stage interest moDEling (ENCODE)<n>In the offline extraction stage, ENCODE clusters the entire behavior sequence and extracts accurate interests.<n>While in the online inference stage, ENCODE takes the off-the-shelf user interests to predict the associations with target items.
arXiv Detail & Related papers (2025-08-19T06:58:21Z)
User Long-Term Multi-Interest Retrieval Model for Recommendation [20.2928687653132]
We propose a new framework named User Long-term Multi-Interest Retrieval Model(ULIM), which enables thousand-scale behavior modeling in retrieval stages.<n>We show that ULIM achieves substantial improvement over state-of-the-art methods, and brings 5.54% clicks, 11.01% orders and 4.03% GMV lift for Taobaomiaosha, a notable mini-app of Taobao.
arXiv Detail & Related papers (2025-07-14T09:32:26Z)
Unleashing the Potential of Two-Tower Models: Diffusion-Based Cross-Interaction for Large-Scale Matching [25.672699790866726]
Two-tower models are widely adopted in the industrial-scale matching stage across a broad range of application domains.<n>We propose a "cross-interaction decoupling architecture" within our matching paradigm.
arXiv Detail & Related papers (2025-02-28T03:40:37Z)
Multi-granularity Interest Retrieval and Refinement Network for Long-Term User Behavior Modeling in CTR Prediction [68.90783662117936]
Click-through Rate (CTR) prediction is crucial for online personalization platforms.<n>Recent advancements have shown that modeling rich user behaviors can significantly improve the performance of CTR prediction.<n>We propose Multi-granularity Interest Retrieval and Refinement Network (MIRRN)
arXiv Detail & Related papers (2024-11-22T15:29:05Z)
Beyond Two-Tower Matching: Learning Sparse Retrievable Cross-Interactions for Recommendation [80.19762472699814]
Two-tower models are a prevalent matching framework for recommendation, which have been widely deployed in industrial applications. It suffers two main challenges, including limited feature interaction capability and reduced accuracy in online serving. We propose a new matching paradigm named SparCode, which supports not only sophisticated feature interactions but also efficient retrieval.
arXiv Detail & Related papers (2023-11-30T03:13:36Z)
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories. We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks. We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z)
Broad Recommender System: An Efficient Nonlinear Collaborative Filtering Approach [56.12815715932561]
We propose a new broad recommender system called Broad Collaborative Filtering (BroadCF) Instead of Deep Neural Networks (DNNs), Broad Learning System (BLS) is used as a mapping function to learn the complex nonlinear relationships between users and items. Extensive experiments conducted on seven benchmark datasets have confirmed the effectiveness of the proposed BroadCF algorithm.
arXiv Detail & Related papers (2022-04-20T01:25:08Z)
Dual-embedding based Neural Collaborative Filtering for Recommender Systems [0.7949579654743338]
We propose a general collaborative filtering framework named DNCF, short for Dual-embedding based Neural Collaborative Filtering. In addition to learning the primitive embedding for a user (an item), we introduce an additional embedding from the perspective of the interacted items (users) to augment the user (item) representation.
arXiv Detail & Related papers (2021-02-04T11:32:11Z)
Dynamic Graph Collaborative Filtering [64.87765663208927]
Dynamic recommendation is essential for recommender systems to provide real-time predictions based on sequential data. Here we propose Dynamic Graph Collaborative Filtering (DGCF), a novel framework leveraging dynamic graphs to capture collaborative and sequential relations. Our approach achieves higher performance when the dataset contains less action repetition, indicating the effectiveness of integrating dynamic collaborative information.
arXiv Detail & Related papers (2021-01-08T04:16:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.