Multimodal Enhancement of Sequential Recommendation
- URL: http://arxiv.org/abs/2602.07207v1
- Date: Fri, 06 Feb 2026 21:32:56 GMT
- Title: Multimodal Enhancement of Sequential Recommendation
- Authors: Bucher Sahyouni, Matthew Vowels, Liqun Chen, Simon Hadfield,
- Abstract summary: We propose a novel recommender framework, MuSTRec, that unifies multimodal and sequential recommendation paradigms.<n>MuSTRec captures cross-item similarities and collaborative filtering signals, by building item-item graphs from extracted text and visual features.<n>Across multiple Amazon datasets, MuSTRec demonstrates superior performance (up to 33.5% improvement) over multimodal and sequential state-of-the-art baselines.
- Score: 10.466765832314683
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We propose a novel recommender framework, MuSTRec (Multimodal and Sequential Transformer-based Recommendation), that unifies multimodal and sequential recommendation paradigms. MuSTRec captures cross-item similarities and collaborative filtering signals, by building item-item graphs from extracted text and visual features. A frequency-based self-attention module additionally captures the short- and long-term user preferences. Across multiple Amazon datasets, MuSTRec demonstrates superior performance (up to 33.5% improvement) over multimodal and sequential state-of-the-art baselines. Finally, we detail some interesting facets of this new recommendation paradigm. These include the need for a new data partitioning regime, and a demonstration of how integrating user embeddings into sequential recommendation leads to drastically increased short-term metrics (up to 200% improvement) on smaller datasets. Our code is availabe at https://anonymous.4open.science/r/MuSTRec-D32B/ and will be made publicly available.
Related papers
- Sequences as Nodes for Contrastive Multimodal Graph Recommendation [10.466765832314683]
MuSICRec is a graph-based recommender that combines collaborative, sequential, and multimodal signals.<n>On the Amazon Baby, Sports, and Electronics datasets, MuSICRec outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2026-02-06T21:35:12Z) - Multimodal Generative Recommendation for Fusing Semantic and Collaborative Signals [17.608491612845306]
Sequential recommender systems rank relevant items by modeling a user's interaction history and computing the inner product between the resulting user representation and stored item embeddings.<n>To avoid the significant memory overhead of storing large item sets, the generative recommendation paradigm instead models each item as a series of discrete semantic codes.<n>These methods have yet to surpass traditional sequential recommenders on large item sets, limiting their adoption in the very scenarios they were designed to address.<n>We propose MSCGRec, a Multimodal Semantic and Collaborative Generative Recommender.
arXiv Detail & Related papers (2026-02-03T16:39:35Z) - OneMall: One Architecture, More Scenarios -- End-to-End Generative Recommender Family at Kuaishou E-Commerce [68.7552227901176]
OneMall is an end-to-end generative recommendation framework tailored for e-commerce services at Kuaishou.<n>It unifies the e-commerce's multiple item distribution scenarios, such as Product-card, short-video and live-streaming.<n>OneMall has been deployed, serving over 400 million daily active users at Kuaishou.
arXiv Detail & Related papers (2026-01-29T14:22:39Z) - Structurally Refined Graph Transformer for Multimodal Recommendation [13.296555757708298]
We present SRGFormer, a structurally optimized multimodal recommendation model.<n>By modifying the transformer for better integration into our model, we capture the overall behavior patterns of users.<n>Then, we enhance structural information by embedding multimodal information into a hypergraph structure to aid in learning the local structures between users and items.
arXiv Detail & Related papers (2025-11-01T15:18:00Z) - HyMiRec: A Hybrid Multi-interest Learning Framework for LLM-based Sequential Recommendation [24.720767926024433]
HyMiRec is a hybrid sequential recommendation framework for large language models.<n>It extracts coarse interest embeddings from long user sequences and an LLM-based recommender to captures refined interest embeddings.<n>To model the diverse preferences of users, we design a disentangled multi-interest learning module.
arXiv Detail & Related papers (2025-10-15T16:45:59Z) - FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [57.577843653775]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z) - MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt [60.10555128510744]
Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary image information from different modalities.<n>Recently, large-scale pre-trained models like CLIP have demonstrated impressive performance in traditional single-modal object ReID tasks.<n>We introduce a novel framework called MambaPro for multi-modal object ReID.
arXiv Detail & Related papers (2024-12-14T06:33:53Z) - MMGRec: Multimodal Generative Recommendation with Transformer Model [81.61896141495144]
MMGRec aims to introduce a generative paradigm into multimodal recommendation.
We first devise a hierarchical quantization method Graph CF-RQVAE to assign Rec-ID for each item from its multimodal information.
We then train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences.
arXiv Detail & Related papers (2024-04-25T12:11:27Z) - MISSRec: Pre-training and Transferring Multi-modal Interest-aware
Sequence Representation for Recommendation [61.45986275328629]
We propose MISSRec, a multi-modal pre-training and transfer learning framework for sequential recommendation.
On the user side, we design a Transformer-based encoder-decoder model, where the contextual encoder learns to capture the sequence-level multi-modal user interests.
On the candidate item side, we adopt a dynamic fusion module to produce user-adaptive item representation.
arXiv Detail & Related papers (2023-08-22T04:06:56Z) - Controllable Multi-Interest Framework for Recommendation [64.30030600415654]
We formalize the recommender system as a sequential recommendation problem.
We propose a novel controllable multi-interest framework for the sequential recommendation, called ComiRec.
Our framework has been successfully deployed on the offline Alibaba distributed cloud platform.
arXiv Detail & Related papers (2020-05-19T10:18:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.