LLM-Enhanced Multimodal Fusion for Cross-Domain Sequential Recommendation
- URL: http://arxiv.org/abs/2506.17966v1
- Date: Sun, 22 Jun 2025 09:53:21 GMT
- Title: LLM-Enhanced Multimodal Fusion for Cross-Domain Sequential Recommendation
- Authors: Wangyu Wu, Zhenhong Chen, Xianglin Qiu, Siqi Song, Xiaowei Huang, Fei Ma, Jimin Xiao,
- Abstract summary: Cross-Domain Sequential Recommendation (CDSR) predicts user behavior by leveraging historical interactions across multiple domains.<n>We propose LLM-Enhanced Multimodal Fusion for Cross-Domain Sequential Recommendation (LLM-EMF)<n>LLM-EMF is a novel and advanced approach that enhances textual information with Large Language Models (LLM) knowledge.
- Score: 19.654959889052638
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-Domain Sequential Recommendation (CDSR) predicts user behavior by leveraging historical interactions across multiple domains, focusing on modeling cross-domain preferences and capturing both intra- and inter-sequence item relationships. We propose LLM-Enhanced Multimodal Fusion for Cross-Domain Sequential Recommendation (LLM-EMF), a novel and advanced approach that enhances textual information with Large Language Models (LLM) knowledge and significantly improves recommendation performance through the fusion of visual and textual data. Using the frozen CLIP model, we generate image and text embeddings, thereby enriching item representations with multimodal data. A multiple attention mechanism jointly learns both single-domain and cross-domain preferences, effectively capturing and understanding complex user interests across diverse domains. Evaluations conducted on four e-commerce datasets demonstrate that LLM-EMF consistently outperforms existing methods in modeling cross-domain user preferences, thereby highlighting the effectiveness of multimodal data integration and its advantages in enhancing sequential recommendation systems. Our source code will be released.
Related papers
- Generative Multi-Target Cross-Domain Recommendation [48.54929268144516]
This paper introduces GMC, a generative paradigm-based approach for multi-target cross-domain recommendation.<n>The core idea of GMC is to leverage semantically quantized discrete item identifiers as a medium for integrating multi-domain knowledge.<n>Extensive experiments on five public datasets demonstrate the effectiveness of GMC.
arXiv Detail & Related papers (2025-07-17T07:44:05Z) - LLM2Rec: Large Language Models Are Powerful Embedding Models for Sequential Recommendation [49.78419076215196]
Sequential recommendation aims to predict users' future interactions by modeling collaborative filtering (CF) signals from historical behaviors of similar users or items.<n>Traditional sequential recommenders rely on ID-based embeddings, which capture CF signals through high-order co-occurrence patterns.<n>Recent advances in large language models (LLMs) have motivated text-based recommendation approaches that derive item representations from textual descriptions.<n>We argue that an ideal embedding model should seamlessly integrate CF signals with rich semantic representations to improve both in-domain and out-of-domain recommendation performance.
arXiv Detail & Related papers (2025-06-16T13:27:06Z) - Bridge the Domains: Large Language Models Enhanced Cross-domain Sequential Recommendation [30.116213884571803]
Cross-domain Sequential Recommendation (CDSR) aims to extract the preference from the user's historical interactions across various domains.<n>Existing CDSR methods rely on users who own interactions on all domains to learn cross-domain item relationships.<n>With powerful representation and reasoning abilities, Large Language Models (LLMs) are promising to address these two problems.
arXiv Detail & Related papers (2025-04-25T14:30:25Z) - Hierarchical Attention Fusion of Visual and Textual Representations for Cross-Domain Sequential Recommendation [19.654959889052638]
Cross-Domain Sequential Recommendation (CDSR) predicts user behavior by leveraging historical interactions across multiple domains.<n>We propose Hierarchical Attention Fusion of Visual and Textual Representations (HAF-VT), a novel approach integrating visual and textual data to enhance cognitive modeling.<n>A hierarchical attention mechanism jointly learns single-domain and cross-domain preferences, mimicking human information integration.
arXiv Detail & Related papers (2025-04-21T13:18:54Z) - AgentCF++: Memory-enhanced LLM-based Agents for Popularity-aware Cross-domain Recommendations [28.559223475725137]
LLM-based user agents are emerging as a promising approach to enhancing recommender systems.<n>We propose a dual-layer memory architecture combined with a two-step fusion mechanism.<n>We also introduce the concepts of interest groups and group-shared memory to better capture the influence of popularity factors on users with similar interests.
arXiv Detail & Related papers (2025-02-19T16:02:59Z) - Image Fusion for Cross-Domain Sequential Recommendation [20.37668418178215]
Cross-Domain Sequential Recommendation aims to predict future user interactions based on historical interactions across multiple domains.<n>Key challenge in CDSR is effectively capturing cross-domain user preferences by fully leveraging both intra-sequence and inter-sequence item interactions.<n>We propose a novel method, Image Fusion for Cross-Domain Sequential Recommendation (IFCDSR), which incorporates item image information to better capture visual preferences.
arXiv Detail & Related papers (2024-12-31T02:44:38Z) - Investigating the potential of Sparse Mixtures-of-Experts for multi-domain neural machine translation [59.41178047749177]
We focus on multi-domain Neural Machine Translation, with the goal of developing efficient models which can handle data from various domains seen during training and are robust to domains unseen during training.
We hypothesize that Sparse Mixture-of-Experts (SMoE) models are a good fit for this task, as they enable efficient model scaling.
We conduct a series of experiments aimed at validating the utility of SMoE for the multi-domain scenario, and find that a straightforward width scaling of Transformer is a simpler and surprisingly more efficient approach in practice, and reaches the same performance level as SMoE.
arXiv Detail & Related papers (2024-07-01T09:45:22Z) - Exploring User Retrieval Integration towards Large Language Models for Cross-Domain Sequential Recommendation [66.72195610471624]
Cross-Domain Sequential Recommendation aims to mine and transfer users' sequential preferences across different domains.
We propose a novel framework named URLLM, which aims to improve the CDSR performance by exploring the User Retrieval approach.
arXiv Detail & Related papers (2024-06-05T09:19:54Z) - MM-GEF: Multi-modal representation meet collaborative filtering [43.88159639990081]
We propose a graph-based item structure enhancement method MM-GEF: Multi-Modal recommendation with Graph Early-Fusion.
MM-GEF learns refined item representations by injecting structural information obtained from both multi-modal and collaborative signals.
arXiv Detail & Related papers (2023-08-14T15:47:36Z) - Exploiting Graph Structured Cross-Domain Representation for Multi-Domain
Recommendation [71.45854187886088]
Multi-domain recommender systems benefit from cross-domain representation learning and positive knowledge transfer.
We use temporal intra- and inter-domain interactions as contextual information for our method called MAGRec.
We perform experiments on publicly available datasets in different scenarios where MAGRec consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-02-12T19:51:32Z) - Dual Attentive Sequential Learning for Cross-Domain Click-Through Rate
Prediction [76.98616102965023]
Cross domain recommender system constitutes a powerful method to tackle the cold-start and sparsity problem.
We propose a novel approach to cross-domain sequential recommendations based on the dual learning mechanism.
arXiv Detail & Related papers (2021-06-05T01:21:21Z) - Learning to Combine: Knowledge Aggregation for Multi-Source Domain
Adaptation [56.694330303488435]
We propose a Learning to Combine for Multi-Source Domain Adaptation (LtC-MSDA) framework.
In the nutshell, a knowledge graph is constructed on the prototypes of various domains to realize the information propagation among semantically adjacent representations.
Our approach outperforms existing methods with a remarkable margin.
arXiv Detail & Related papers (2020-07-17T07:52:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.