Related papers: Multimodal Enhancement of Sequential Recommendation

Multimodal Enhancement of Sequential Recommendation

URL: http://arxiv.org/abs/2602.07207v1
Date: Fri, 06 Feb 2026 21:32:56 GMT
Title: Multimodal Enhancement of Sequential Recommendation
Authors: Bucher Sahyouni, Matthew Vowels, Liqun Chen, Simon Hadfield,
Abstract summary: We propose a novel recommender framework, MuSTRec, that unifies multimodal and sequential recommendation paradigms.<n>MuSTRec captures cross-item similarities and collaborative filtering signals, by building item-item graphs from extracted text and visual features.<n>Across multiple Amazon datasets, MuSTRec demonstrates superior performance (up to 33.5% improvement) over multimodal and sequential state-of-the-art baselines.
Score: 10.466765832314683
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We propose a novel recommender framework, MuSTRec (Multimodal and Sequential Transformer-based Recommendation), that unifies multimodal and sequential recommendation paradigms. MuSTRec captures cross-item similarities and collaborative filtering signals, by building item-item graphs from extracted text and visual features. A frequency-based self-attention module additionally captures the short- and long-term user preferences. Across multiple Amazon datasets, MuSTRec demonstrates superior performance (up to 33.5% improvement) over multimodal and sequential state-of-the-art baselines. Finally, we detail some interesting facets of this new recommendation paradigm. These include the need for a new data partitioning regime, and a demonstration of how integrating user embeddings into sequential recommendation leads to drastically increased short-term metrics (up to 200% improvement) on smaller datasets. Our code is availabe at https://anonymous.4open.science/r/MuSTRec-D32B/ and will be made publicly available.

Related papers

Sequences as Nodes for Contrastive Multimodal Graph Recommendation [10.466765832314683]
MuSICRec is a graph-based recommender that combines collaborative, sequential, and multimodal signals.<n>On the Amazon Baby, Sports, and Electronics datasets, MuSICRec outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2026-02-06T21:35:12Z)
Multimodal Generative Recommendation for Fusing Semantic and Collaborative Signals [17.608491612845306]
Sequential recommender systems rank relevant items by modeling a user's interaction history and computing the inner product between the resulting user representation and stored item embeddings.<n>To avoid the significant memory overhead of storing large item sets, the generative recommendation paradigm instead models each item as a series of discrete semantic codes.<n>These methods have yet to surpass traditional sequential recommenders on large item sets, limiting their adoption in the very scenarios they were designed to address.<n>We propose MSCGRec, a Multimodal Semantic and Collaborative Generative Recommender.
arXiv Detail & Related papers (2026-02-03T16:39:35Z)
OneMall: One Architecture, More Scenarios -- End-to-End Generative Recommender Family at Kuaishou E-Commerce [68.7552227901176]
OneMall is an end-to-end generative recommendation framework tailored for e-commerce services at Kuaishou.<n>It unifies the e-commerce's multiple item distribution scenarios, such as Product-card, short-video and live-streaming.<n>OneMall has been deployed, serving over 400 million daily active users at Kuaishou.
arXiv Detail & Related papers (2026-01-29T14:22:39Z)
Structurally Refined Graph Transformer for Multimodal Recommendation [13.296555757708298]
We present SRGFormer, a structurally optimized multimodal recommendation model.<n>By modifying the transformer for better integration into our model, we capture the overall behavior patterns of users.<n>Then, we enhance structural information by embedding multimodal information into a hypergraph structure to aid in learning the local structures between users and items.
arXiv Detail & Related papers (2025-11-01T15:18:00Z)
HyMiRec: A Hybrid Multi-interest Learning Framework for LLM-based Sequential Recommendation [24.720767926024433]
HyMiRec is a hybrid sequential recommendation framework for large language models.<n>It extracts coarse interest embeddings from long user sequences and an LLM-based recommender to captures refined interest embeddings.<n>To model the diverse preferences of users, we design a disentangled multi-interest learning module.
arXiv Detail & Related papers (2025-10-15T16:45:59Z)
FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [57.577843653775]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z)
MambaPro: Multi-Modal Object Re-Identification with Mamba Aggregation and Synergistic Prompt [60.10555128510744]
Multi-modal object Re-IDentification (ReID) aims to retrieve specific objects by utilizing complementary image information from different modalities.<n>Recently, large-scale pre-trained models like CLIP have demonstrated impressive performance in traditional single-modal object ReID tasks.<n>We introduce a novel framework called MambaPro for multi-modal object ReID.
arXiv Detail & Related papers (2024-12-14T06:33:53Z)
MMGRec: Multimodal Generative Recommendation with Transformer Model [81.61896141495144]
MMGRec aims to introduce a generative paradigm into multimodal recommendation. We first devise a hierarchical quantization method Graph CF-RQVAE to assign Rec-ID for each item from its multimodal information. We then train a Transformer-based recommender to generate the Rec-IDs of user-preferred items based on historical interaction sequences.
arXiv Detail & Related papers (2024-04-25T12:11:27Z)
MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation [61.45986275328629]
We propose MISSRec, a multi-modal pre-training and transfer learning framework for sequential recommendation. On the user side, we design a Transformer-based encoder-decoder model, where the contextual encoder learns to capture the sequence-level multi-modal user interests. On the candidate item side, we adopt a dynamic fusion module to produce user-adaptive item representation.
arXiv Detail & Related papers (2023-08-22T04:06:56Z)
Controllable Multi-Interest Framework for Recommendation [64.30030600415654]
We formalize the recommender system as a sequential recommendation problem. We propose a novel controllable multi-interest framework for the sequential recommendation, called ComiRec. Our framework has been successfully deployed on the offline Alibaba distributed cloud platform.
arXiv Detail & Related papers (2020-05-19T10:18:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.