Facet-Aware Multi-Head Mixture-of-Experts Model with Text-Enhanced Pre-training for Sequential Recommendation
- URL: http://arxiv.org/abs/2601.12301v1
- Date: Sun, 18 Jan 2026 08:00:00 GMT
- Title: Facet-Aware Multi-Head Mixture-of-Experts Model with Text-Enhanced Pre-training for Sequential Recommendation
- Authors: Mingrui Liu, Sixiao Zhang, Cheng Long,
- Abstract summary: Sequential recommendation (SR) systems excel at capturing users' dynamic preferences by leveraging their interaction histories.<n>We argue that this representation alone is insufficient to capture an item's multi-faceted nature.<n>We propose a novel architecture titled Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation (FAME)
- Score: 32.13241752028528
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequential recommendation (SR) systems excel at capturing users' dynamic preferences by leveraging their interaction histories. Most existing SR systems assign a single embedding vector to each item to represent its features, adopting various models to combine these embeddings into a sequence representation that captures user intent. However, we argue that this representation alone is insufficient to capture an item's multi-faceted nature (e.g., movie genres, starring actors). Furthermore, users often exhibit complex and varied preferences within these facets (e.g., liking both action and musical films within the genre facet), which are challenging to fully represent with static identifiers. To address these issues, we propose a novel architecture titled Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation (FAME). We leverage sub-embeddings from each head in the final multi-head attention layer to predict the next item separately, effectively capturing distinct item facets. A gating mechanism then integrates these predictions by dynamically determining their importance. Additionally, we introduce a Mixture-of-Experts (MoE) network within each attention head to disentangle varied user preferences within each facet, utilizing a learnable router network to aggregate expert outputs based on context. Complementing this architecture, we design a Text-Enhanced Facet-Aware Pre-training module to overcome the limitations of randomly initialized embeddings. By utilizing a pre-trained text encoder and employing an alternating supervised contrastive learning objective, we explicitly disentangle facet-specific features from textual metadata (e.g., descriptions) before sequential training begins. This ensures that the item embeddings are semantically robust and aligned with the downstream multi-facet framework.
Related papers
- Multimodal Generative Recommendation for Fusing Semantic and Collaborative Signals [17.608491612845306]
Sequential recommender systems rank relevant items by modeling a user's interaction history and computing the inner product between the resulting user representation and stored item embeddings.<n>To avoid the significant memory overhead of storing large item sets, the generative recommendation paradigm instead models each item as a series of discrete semantic codes.<n>These methods have yet to surpass traditional sequential recommenders on large item sets, limiting their adoption in the very scenarios they were designed to address.<n>We propose MSCGRec, a Multimodal Semantic and Collaborative Generative Recommender.
arXiv Detail & Related papers (2026-02-03T16:39:35Z) - CoFiRec: Coarse-to-Fine Tokenization for Generative Recommendation [55.783414010717074]
CoFiRec is a novel generative recommendation framework that decomposes item information into semantic levels.<n>We show that CoFiRec outperforms existing methods, offering a new perspective for generative recommendation.
arXiv Detail & Related papers (2025-11-27T18:59:35Z) - Leveraging Scene Context with Dual Networks for Sequential User Behavior Modeling [58.72480539725212]
We propose a novel Dual Sequence Prediction networks (DSPnet) to capture the dynamic interests and interplay between scenes and items for future behavior prediction.<n>DSPnet consists of two parallel networks dedicated to learn users' dynamic interests over items and scenes, and a sequence feature enhancement module to capture the interplay for enhanced future behavior prediction.
arXiv Detail & Related papers (2025-09-30T12:26:57Z) - MSCRS: Multi-modal Semantic Graph Prompt Learning Framework for Conversational Recommender Systems [15.792566559456422]
Conversational Recommender Systems (CRS) aim to provide personalized recommendations by interacting with users through conversations.<n>We propose a multi-modal semantic graph prompt learning framework for CRS, named MSCRS.<n>Our proposed method significantly improves accuracy in item recommendation, as well as generates more natural and contextually relevant content in response generation.
arXiv Detail & Related papers (2025-04-15T07:05:22Z) - Context-Aware Lifelong Sequential Modeling for Online Click-Through Rate Prediction [4.561273938467592]
We propose the Context-Aware Interest Network (CAIN) for lifelong sequential modeling.<n>CAIN uses the Temporal Convolutional Network (TCN) to create context-aware representations for each item throughout the lifelong sequence.<n>We show that CAIN outperforms existing methods in terms of prediction accuracy and online performance metrics.
arXiv Detail & Related papers (2025-02-18T08:24:53Z) - Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation [25.516648802281626]
We propose a novel structure called Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation (FAME)<n>We leverage sub-embeddings from each head in the last multi-head attention layer to predict the next item separately.<n>A gating mechanism integrates recommendations from each head and dynamically determines their importance.
arXiv Detail & Related papers (2024-11-03T06:47:45Z) - Learning Multi-Aspect Item Palette: A Semantic Tokenization Framework for Generative Recommendation [55.99632509895994]
We introduce LAMIA, a novel approach for multi-aspect semantic tokenization.<n>Unlike RQ-VAE, which uses a single embedding, LAMIA learns an item palette''--a collection of independent and semantically parallel embeddings.<n>Our results demonstrate significant improvements in recommendation accuracy over existing methods.
arXiv Detail & Related papers (2024-09-11T13:49:48Z) - MMGRec: Multimodal Generative Recommendation with Transformer Model [81.61896141495144]
MMGRec aims to introduce a generative paradigm into multimodal recommendation.
We first devise a hierarchical quantization method Graph CF-RQVAE to assign Rec-ID for each item from its multimodal information.
We then train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences.
arXiv Detail & Related papers (2024-04-25T12:11:27Z) - MISSRec: Pre-training and Transferring Multi-modal Interest-aware
Sequence Representation for Recommendation [61.45986275328629]
We propose MISSRec, a multi-modal pre-training and transfer learning framework for sequential recommendation.
On the user side, we design a Transformer-based encoder-decoder model, where the contextual encoder learns to capture the sequence-level multi-modal user interests.
On the candidate item side, we adopt a dynamic fusion module to produce user-adaptive item representation.
arXiv Detail & Related papers (2023-08-22T04:06:56Z) - Multi-Behavior Hypergraph-Enhanced Transformer for Sequential
Recommendation [33.97708796846252]
We introduce a new Multi-Behavior Hypergraph-enhanced Transformer framework (MBHT) to capture both short-term and long-term cross-type behavior dependencies.
Specifically, a multi-scale Transformer is equipped with low-rank self-attention to jointly encode behavior-aware sequential patterns from fine-grained and coarse-grained levels.
arXiv Detail & Related papers (2022-07-12T15:07:21Z) - Sparse-Interest Network for Sequential Recommendation [78.83064567614656]
We propose a novel textbfSparse textbfInterest textbfNEtwork (SINE) for sequential recommendation.
Our sparse-interest module can adaptively infer a sparse set of concepts for each user from the large concept pool.
SINE can achieve substantial improvement over state-of-the-art methods.
arXiv Detail & Related papers (2021-02-18T11:03:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.