Related papers: Mining Latent Structures for Multimedia Recommendation

Mining Latent Structures for Multimedia Recommendation

URL: http://arxiv.org/abs/2104.09036v1
Date: Mon, 19 Apr 2021 03:50:24 GMT
Title: Mining Latent Structures for Multimedia Recommendation
Authors: Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, Liang Wang
Abstract summary: We propose a LATent sTructure mining method for multImodal reCommEndation, which we term LATTICE for brevity. We learn item-item structures for each modality and aggregates multiple modalities to obtain latent item graphs. Based on the learned latent graphs, we perform graph convolutions to explicitly inject high-order item affinities into item representations.
Score: 46.70109406399858
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimedia content is of predominance in the modern Web era. Investigating how users interact with multimodal items is a continuing concern within the rapid development of recommender systems. The majority of previous work focuses on modeling user-item interactions with multimodal features included as side information. However, this scheme is not well-designed for multimedia recommendation. Specifically, only collaborative item-item relationships are implicitly modeled through high-order item-user-item relations. Considering that items are associated with rich contents in multiple modalities, we argue that the latent item-item structures underlying these multimodal contents could be beneficial for learning better item representations and further boosting recommendation. To this end, we propose a LATent sTructure mining method for multImodal reCommEndation, which we term LATTICE for brevity. To be specific, in the proposed LATTICE model, we devise a novel modality-aware structure learning layer, which learns item-item structures for each modality and aggregates multiple modalities to obtain latent item graphs. Based on the learned latent graphs, we perform graph convolutions to explicitly inject high-order item affinities into item representations. These enriched item representations can then be plugged into existing collaborative filtering methods to make more accurate recommendations. Extensive experiments on three real-world datasets demonstrate the superiority of our method over state-of-the-art multimedia recommendation methods and validate the efficacy of mining latent item-item relationships from multimodal features.

Related papers

Multimodal Difference Learning for Sequential Recommendation [5.243083216855681]
We argue that user interests and item relationships vary across different modalities. We propose a novel Multimodal Learning framework for Sequential Recommendation, MDSRec. Results on five real-world datasets demonstrate the superiority of MDSRec over state-of-the-art baselines.
arXiv Detail & Related papers (2024-12-11T05:08:19Z)
Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization [49.08348604716746]
Multimodal Summarization with Multimodal Output (MSMO) aims to produce a multimodal summary that integrates both text and relevant images. In this paper, we propose an Entity-Guided Multimodal Summarization model (EGMS) Our model, building on BART, utilizes dual multimodal encoders with shared weights to process text-image and entity-image information concurrently.
arXiv Detail & Related papers (2024-08-06T12:45:56Z)
Fine-tuning Multimodal Large Language Models for Product Bundling [53.01642741096356]
We introduce Bundle-MLLM, a novel framework that fine-tunes large language models (LLMs) through a hybrid item tokenization approach. Specifically, we integrate textual, media, and relational data into a unified tokenization, introducing a soft separation token to distinguish between textual and non-textual tokens. We propose a progressive optimization strategy that fine-tunes LLMs for disentangled objectives: 1) learning bundle patterns and 2) enhancing multimodal semantic understanding specific to product bundling.
arXiv Detail & Related papers (2024-07-16T13:30:14Z)
Towards Bridging the Cross-modal Semantic Gap for Multi-modal Recommendation [12.306686291299146]
Multi-modal recommendation greatly enhances the performance of recommender systems. Most existing multi-modal recommendation models exploit multimedia information propagation processes to enrich item representations. We propose a novel framework to bridge the semantic gap between modalities and extract fine-grained multi-view semantic information.
arXiv Detail & Related papers (2024-07-07T15:56:03Z)
DiffMM: Multi-Modal Diffusion Model for Recommendation [19.43775593283657]
We propose a novel multi-modal graph diffusion model for recommendation called DiffMM. Our framework integrates a modality-aware graph diffusion model with a cross-modal contrastive learning paradigm to improve modality-aware user representation learning.
arXiv Detail & Related papers (2024-06-17T17:35:54Z)
NoteLLM-2: Multimodal Large Representation Models for Recommendation [71.87790090964734]
Large Language Models (LLMs) have demonstrated exceptional proficiency in text understanding and embedding tasks. Their potential in multimodal representation, particularly for item-to-item (I2I) recommendations, remains underexplored. We propose an end-to-end fine-tuning method that customizes the integration of any existing LLMs and vision encoders for efficient multimodal representation.
arXiv Detail & Related papers (2024-05-27T03:24:01Z)
ID Embedding as Subtle Features of Content and Structure for Multimodal Recommendation [13.338363107777438]
We propose a novel recommendation model by incorporating ID embeddings to enhance the salient features of both content and structure. Our method is superior to state-of-the-art multimodal recommendation methods and the effectiveness of fine-grained ID embeddings.
arXiv Detail & Related papers (2023-11-10T09:41:28Z)
MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization [93.5217515566437]
Multi-modal Product Summarization (MPS) aims to increase customers' desire to purchase by highlighting product characteristics. Existing MPS methods can produce promising results, but they still lack end-to-end product summarization. We propose an end-to-end multi-modal attribute-aware product summarization method (MMAPS) for generating high-quality product summaries in e-commerce.
arXiv Detail & Related papers (2023-08-22T11:00:09Z)
MM-GEF: Multi-modal representation meet collaborative filtering [43.88159639990081]
We propose a graph-based item structure enhancement method MM-GEF: Multi-Modal recommendation with Graph Early-Fusion. MM-GEF learns refined item representations by injecting structural information obtained from both multi-modal and collaborative signals.
arXiv Detail & Related papers (2023-08-14T15:47:36Z)
Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data. Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds. We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z)
Latent Structures Mining with Contrastive Modality Fusion for Multimedia Recommendation [22.701371886522494]
We argue that the latent semantic item-item structures underlying multimodal contents could be beneficial for learning better item representations. We devise a novel modality-aware structure learning module, which learns item-item relationships for each modality.
arXiv Detail & Related papers (2021-11-01T03:37:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.