RecBase: Generative Foundation Model Pretraining for Zero-Shot Recommendation
- URL: http://arxiv.org/abs/2509.03131v1
- Date: Wed, 03 Sep 2025 08:33:43 GMT
- Title: RecBase: Generative Foundation Model Pretraining for Zero-Shot Recommendation
- Authors: Sashuai Zhou, Weinan Gan, Qijiong Liu, Ke Lei, Jieming Zhu, Hai Huang, Yan Xia, Ruiming Tang, Zhenhua Dong, Zhou Zhao,
- Abstract summary: RecBase is a domain-agnostic foundational model pretrained with a recommendation-oriented objective.<n>We introduce a unified item tokenizer that encodes items into hierarchical concept identifiers.<n>Our model matches or surpasses the performance of LLM baselines up to 7B parameters in zero-shot and cross-domain recommendation tasks.
- Score: 78.01030342481246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in LLM-based recommendation have shown promise, yet their cross-domain generalization is hindered by a fundamental mismatch between language-centric pretraining and the recommendation task. Existing methods, relying on language-level knowledge, fail to capture dynamic, item-level user interests across domains. To bridge this gap, we propose RecBase, a domain-agnostic foundational model pretrained with a recommendation-oriented objective. RecBase leverages a large-scale, heterogeneous, cross-domain corpus with unified textual representations and feature mappings to enhance cross-domain generalization. To further align item semantics across domains, we introduce a unified item tokenizer that encodes items into hierarchical concept identifiers, enabling structured representation and efficient vocabulary sharing. The model is trained using an autoregressive objective to capture complex item-level sequential patterns. On eight real-world datasets, our 1.5B-parameter model matches or surpasses the performance of LLM baselines up to 7B parameters in zero-shot and cross-domain recommendation tasks.
Related papers
- FeDecider: An LLM-Based Framework for Federated Cross-Domain Recommendation [75.50721642765994]
Large language model (LLM)-based recommendation models have demonstrated impressive performance.<n>We propose an LLM-based framework for Federated cross-domain recommendation, FeDecider.<n>Extensive experiments across diverse datasets validate the effectiveness of our proposed FeDecider.
arXiv Detail & Related papers (2026-02-17T21:42:28Z) - EncodeRec: An Embedding Backbone for Recommendation Systems [4.7014546279849805]
We present EncodeRec, an approach designed to align textual representations with recommendation objectives while learning compact, informative embeddings.<n> Experiments across core recommendation benchmarks demonstrate its effectiveness both as a backbone for sequential recommendation models and for semantic ID tokenization.<n>These results underscore the pivotal role of embedding adaptation in bridging the gap between general-purpose language models and practical recommender systems.
arXiv Detail & Related papers (2026-01-15T20:15:01Z) - From IDs to Semantics: A Generative Framework for Cross-Domain Recommendation with Adaptive Semantic Tokenization [3.4546102059619526]
Cross-domain recommendation is crucial for improving recommendation accuracy and generalization.<n>Many efforts have focused on learning disentangled representations through multi-domain joint training to bridge the domain gaps.<n>Recent Large Language Model (LLM)-based approaches show promise, they still face critical challenges.<n>We propose textbfGenCDR, a novel textbfGenerative textbfCross-textbfDomain textbfRecommendation framework.
arXiv Detail & Related papers (2025-11-11T09:10:40Z) - FuDoBa: Fusing Document and Knowledge Graph-based Representations with Bayesian Optimisation [43.56253799373878]
We introduce FuDoBa, a Bayesian optimisation-based method that integrates LLM-based embeddings with domain-specific structured knowledge.<n>This fusion produces low-dimensional, task-relevant representations while reducing training complexity and yielding interpretable early-fusion weights.<n>We demonstrate the effectiveness of our approach on six datasets in two domains, showing that our proposed representation learning approach performs on par with, or surpasses, those produced solely by the proprietary LLM-based embedding baselines.
arXiv Detail & Related papers (2025-07-09T07:49:55Z) - LLM2Rec: Large Language Models Are Powerful Embedding Models for Sequential Recommendation [49.78419076215196]
Sequential recommendation aims to predict users' future interactions by modeling collaborative filtering (CF) signals from historical behaviors of similar users or items.<n>Traditional sequential recommenders rely on ID-based embeddings, which capture CF signals through high-order co-occurrence patterns.<n>Recent advances in large language models (LLMs) have motivated text-based recommendation approaches that derive item representations from textual descriptions.<n>We argue that an ideal embedding model should seamlessly integrate CF signals with rich semantic representations to improve both in-domain and out-of-domain recommendation performance.
arXiv Detail & Related papers (2025-06-16T13:27:06Z) - RecGPT: A Foundation Model for Sequential Recommendation [16.464972558861497]
We develop a foundation model for sequential recommendation that achieves genuine zero-shot generalization capabilities.<n>Our approach departs from existing ID-based methods by deriving item representations exclusively from textual features.<n>We introduce unified item tokenization with Finite Scalar Quantization that transforms heterogeneous textual descriptions into standardized discrete tokens.
arXiv Detail & Related papers (2025-06-06T17:53:02Z) - LLM-RecG: A Semantic Bias-Aware Framework for Zero-Shot Sequential Recommendation [5.512301280728178]
Zero-shot cross-domain sequential recommendation (ZCDSR) enables predictions in unseen domains without additional training or fine-tuning.<n>Recent advancements in large language models (LLMs) have significantly enhanced ZCDSR by facilitating cross-domain knowledge transfer.<n>We propose a novel semantic bias-aware framework that improves cross-domain alignment at both the item and sequential levels.
arXiv Detail & Related papers (2025-01-31T15:43:21Z) - A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap [50.079224604394]
We present a novel model-agnostic framework called textbfContext-textbfEnhanced textbfFeature textbfAment (CEFA)
CEFA consists of a feature alignment module and a context enhancement module.
Our method can serve as a plug-and-play module to improve the detection performance of HOI models on rare categories.
arXiv Detail & Related papers (2024-07-31T08:42:48Z) - Exploring User Retrieval Integration towards Large Language Models for Cross-Domain Sequential Recommendation [66.72195610471624]
Cross-Domain Sequential Recommendation aims to mine and transfer users' sequential preferences across different domains.
We propose a novel framework named URLLM, which aims to improve the CDSR performance by exploring the User Retrieval approach.
arXiv Detail & Related papers (2024-06-05T09:19:54Z) - X2Parser: Cross-Lingual and Cross-Domain Framework for Task-Oriented
Compositional Semantic Parsing [51.81533991497547]
Task-oriented compositional semantic parsing (TCSP) handles complex nested user queries.
We present X2 compared a transferable Cross-lingual and Cross-domain for TCSP.
We propose to predict flattened intents and slots representations separately and cast both prediction tasks into sequence labeling problems.
arXiv Detail & Related papers (2021-06-07T16:40:05Z) - Meta-Learning for Domain Generalization in Semantic Parsing [124.32975734073949]
We use a meta-learning framework which targets zero-shot domain for semantic parsing.
We apply a model-agnostic training algorithm that simulates zero-shot parsing virtual train and test sets from disjoint domains.
arXiv Detail & Related papers (2020-10-22T19:00:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.