Related papers: An Aligning and Training Framework for Multimodal Recommendations

An Aligning and Training Framework for Multimodal Recommendations

URL: http://arxiv.org/abs/2403.12384v3
Date: Tue, 21 May 2024 11:51:03 GMT
Title: An Aligning and Training Framework for Multimodal Recommendations
Authors: Yifan Liu, Kangning Zhang, Xiangyuan Ren, Yanhua Huang, Jiarui Jin, Yingjie Qin, Ruilong Su, Ruiwen Xu, Weinan Zhang,
Abstract summary: multimodal recommendations can leverage rich contexts beyond user and item interactions. Existing methods mainly use them to help learn ID features. There exist semantic gaps among multimodal content features and ID features.
Score: 23.952221685501875
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: With the development of multimedia applications, multimodal recommendations play an essential role, as they can leverage rich contexts beyond user and item interactions. Existing methods mainly use them to help learn ID features; however, there exist semantic gaps among multimodal content features and ID features. Directly using multimodal information as an auxiliary would lead to misalignment in items' and users' representations. In this paper, we first systematically investigate the misalignment issue in multimodal recommendations, and propose a solution named AlignRec. In AlignRec, the recommendation objective is decomposed into three alignments, namely alignment within contents, alignment between content and categorical ID, and alignment between users and items. Each alignment is characterized by a distinct objective function. To effectively train AlignRec, we propose starting from pre-training the first alignment to obtain unified multimodal features and subsequently training the following two alignments together. As it is essential to analyze whether each multimodal feature helps in training, we design three new classes of metrics to evaluate intermediate performance. Our extensive experiments on three real-world datasets consistently verify the superiority of AlignRec compared to nine baselines. We also find that the multimodal features generated by our framework are better than currently used ones, which are to be open-sourced.

Related papers

FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [50.438552588818]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z)
Learning Item Representations Directly from Multimodal Features for Effective Recommendation [51.49251689107541]
multimodal recommender systems predominantly leverage Bayesian Personalized Ranking (BPR) optimization to learn item representations.<n>We propose a novel model (i.e., LIRDRec) that learns item representations directly from multimodal features to augment recommendation performance.
arXiv Detail & Related papers (2025-05-08T05:42:22Z)
IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification [60.38841251693781]
We propose a novel framework to generate robust multi-modal object ReIDs. Our framework uses Modal Prefixes and InverseNet to integrate multi-modal information with semantic guidance from inverted text. Experiments on three multi-modal object ReID benchmarks demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2025-03-13T13:00:31Z)
Multimodal Difference Learning for Sequential Recommendation [5.243083216855681]
We argue that user interests and item relationships vary across different modalities. We propose a novel Multimodal Learning framework for Sequential Recommendation, MDSRec. Results on five real-world datasets demonstrate the superiority of MDSRec over state-of-the-art baselines.
arXiv Detail & Related papers (2024-12-11T05:08:19Z)
CADMR: Cross-Attention and Disentangled Learning for Multimodal Recommender Systems [0.6037276428689637]
We propose CADMR, a novel autoencoder-based multimodal recommender system framework. We evaluate CADMR on three benchmark datasets, demonstrating significant performance improvements over state-of-the-art methods.
arXiv Detail & Related papers (2024-12-03T09:09:52Z)
Train Once, Deploy Anywhere: Matryoshka Representation Learning for Multimodal Recommendation [27.243116376164906]
We introduce a lightweight framework called full-scale Matryoshka representation learning for multimodal recommendation (fMRLRec) Our fMRLRec captures item features at different granularities, learning informative representations for efficient recommendation across multiple dimensions. We demonstrate the effectiveness and efficiency of fMRLRec on multiple benchmark datasets.
arXiv Detail & Related papers (2024-09-25T05:12:07Z)
Learning Multi-Aspect Item Palette: A Semantic Tokenization Framework for Generative Recommendation [55.99632509895994]
We introduce LAMIA, a novel approach for multi-aspect semantic tokenization.<n>Unlike RQ-VAE, which uses a single embedding, LAMIA learns an item palette''--a collection of independent and semantically parallel embeddings.<n>Our results demonstrate significant improvements in recommendation accuracy over existing methods.
arXiv Detail & Related papers (2024-09-11T13:49:48Z)
Personalized Multi-task Training for Recommender System [80.23030752707916]
PMTRec is the first personalized multi-task learning algorithm to obtain comprehensive user/item embeddings from various information sources. Our contributions open new avenues for advancing personalized multi-task training in recommender systems.
arXiv Detail & Related papers (2024-07-31T06:27:06Z)
BiVRec: Bidirectional View-based Multimodal Sequential Recommendation [55.87443627659778]
We propose an innovative framework, BivRec, that jointly trains the recommendation tasks in both ID and multimodal views. BivRec achieves state-of-the-art performance on five datasets and showcases various practical advantages.
arXiv Detail & Related papers (2024-02-27T09:10:41Z)
Ada-Retrieval: An Adaptive Multi-Round Retrieval Paradigm for Sequential Recommendations [50.03560306423678]
We propose Ada-Retrieval, an adaptive multi-round retrieval paradigm for recommender systems. Ada-Retrieval iteratively refines user representations to better capture potential candidates in the full item space.
arXiv Detail & Related papers (2024-01-12T15:26:40Z)
ID Embedding as Subtle Features of Content and Structure for Multimodal Recommendation [13.338363107777438]
We propose a novel recommendation model by incorporating ID embeddings to enhance the salient features of both content and structure. Our method is superior to state-of-the-art multimodal recommendation methods and the effectiveness of fine-grained ID embeddings.
arXiv Detail & Related papers (2023-11-10T09:41:28Z)
Preserving Modality Structure Improves Multi-Modal Learning [64.10085674834252]
Self-supervised learning on large-scale multi-modal datasets allows learning semantically meaningful embeddings without relying on human annotations. These methods often struggle to generalize well on out-of-domain data as they ignore the semantic structure present in modality-specific embeddings. We propose a novel Semantic-Structure-Preserving Consistency approach to improve generalizability by preserving the modality-specific relationships in the joint embedding space.
arXiv Detail & Related papers (2023-08-24T20:46:48Z)
MISSRec: Pre-training and Transferring Multi-modal Interest-aware Sequence Representation for Recommendation [61.45986275328629]
We propose MISSRec, a multi-modal pre-training and transfer learning framework for sequential recommendation. On the user side, we design a Transformer-based encoder-decoder model, where the contextual encoder learns to capture the sequence-level multi-modal user interests. On the candidate item side, we adopt a dynamic fusion module to produce user-adaptive item representation.
arXiv Detail & Related papers (2023-08-22T04:06:56Z)
Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data. Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds. We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z)
Latent Structures Mining with Contrastive Modality Fusion for Multimedia Recommendation [22.701371886522494]
We argue that the latent semantic item-item structures underlying multimodal contents could be beneficial for learning better item representations. We devise a novel modality-aware structure learning module, which learns item-item relationships for each modality.
arXiv Detail & Related papers (2021-11-01T03:37:02Z)
Mining Latent Structures for Multimedia Recommendation [46.70109406399858]
We propose a LATent sTructure mining method for multImodal reCommEndation, which we term LATTICE for brevity. We learn item-item structures for each modality and aggregates multiple modalities to obtain latent item graphs. Based on the learned latent graphs, we perform graph convolutions to explicitly inject high-order item affinities into item representations.
arXiv Detail & Related papers (2021-04-19T03:50:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.