Mining Latent Structures for Multimedia Recommendation
- URL: http://arxiv.org/abs/2104.09036v1
- Date: Mon, 19 Apr 2021 03:50:24 GMT
- Title: Mining Latent Structures for Multimedia Recommendation
- Authors: Jinghao Zhang, Yanqiao Zhu, Qiang Liu, Shu Wu, Shuhui Wang, Liang Wang
- Abstract summary: We propose a LATent sTructure mining method for multImodal reCommEndation, which we term LATTICE for brevity.
We learn item-item structures for each modality and aggregates multiple modalities to obtain latent item graphs.
Based on the learned latent graphs, we perform graph convolutions to explicitly inject high-order item affinities into item representations.
- Score: 46.70109406399858
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimedia content is of predominance in the modern Web era. Investigating
how users interact with multimodal items is a continuing concern within the
rapid development of recommender systems. The majority of previous work focuses
on modeling user-item interactions with multimodal features included as side
information. However, this scheme is not well-designed for multimedia
recommendation. Specifically, only collaborative item-item relationships are
implicitly modeled through high-order item-user-item relations. Considering
that items are associated with rich contents in multiple modalities, we argue
that the latent item-item structures underlying these multimodal contents could
be beneficial for learning better item representations and further boosting
recommendation. To this end, we propose a LATent sTructure mining method for
multImodal reCommEndation, which we term LATTICE for brevity. To be specific,
in the proposed LATTICE model, we devise a novel modality-aware structure
learning layer, which learns item-item structures for each modality and
aggregates multiple modalities to obtain latent item graphs. Based on the
learned latent graphs, we perform graph convolutions to explicitly inject
high-order item affinities into item representations. These enriched item
representations can then be plugged into existing collaborative filtering
methods to make more accurate recommendations. Extensive experiments on three
real-world datasets demonstrate the superiority of our method over
state-of-the-art multimedia recommendation methods and validate the efficacy of
mining latent item-item relationships from multimodal features.
Related papers
- Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - ID Embedding as Subtle Features of Content and Structure for Multimodal Recommendation [13.338363107777438]
We propose a novel recommendation model by incorporating ID embeddings to enhance the salient features of both content and structure.
Our method is superior to state-of-the-art multimodal recommendation methods and the effectiveness of fine-grained ID embeddings.
arXiv Detail & Related papers (2023-11-10T09:41:28Z) - MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product
Summarization [93.5217515566437]
Multi-modal Product Summarization (MPS) aims to increase customers' desire to purchase by highlighting product characteristics.
Existing MPS methods can produce promising results, but they still lack end-to-end product summarization.
We propose an end-to-end multi-modal attribute-aware product summarization method (MMAPS) for generating high-quality product summaries in e-commerce.
arXiv Detail & Related papers (2023-08-22T11:00:09Z) - MM-GEF: Multi-modal representation meet collaborative filtering [51.04679619309803]
We propose a graph-based item structure enhancement method MM-GEF: Multi-Modal recommendation with Graph Early-Fusion.
MM-GEF learns refined item representations by injecting structural information obtained from both multi-modal and collaborative signals.
arXiv Detail & Related papers (2023-08-14T15:47:36Z) - Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data.
Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds.
We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z) - Using Multiple Instance Learning to Build Multimodal Representations [3.354271620160378]
Image-text multimodal representation learning aligns data across modalities and enables important medical applications.
We propose a generic framework for constructing permutation-invariant score functions with many existing multimodal representation learning approaches as special cases.
arXiv Detail & Related papers (2022-12-11T18:01:11Z) - Latent Structures Mining with Contrastive Modality Fusion for Multimedia
Recommendation [22.701371886522494]
We argue that the latent semantic item-item structures underlying multimodal contents could be beneficial for learning better item representations.
We devise a novel modality-aware structure learning module, which learns item-item relationships for each modality.
arXiv Detail & Related papers (2021-11-01T03:37:02Z) - Pre-training Graph Transformer with Multimodal Side Information for
Recommendation [82.4194024706817]
We propose a pre-training strategy to learn item representations by considering both item side information and their relationships.
We develop a novel sampling algorithm named MCNSampling to select contextual neighbors for each item.
The proposed Pre-trained Multimodal Graph Transformer (PMGT) learns item representations with two objectives: 1) graph structure reconstruction, and 2) masked node feature reconstruction.
arXiv Detail & Related papers (2020-10-23T10:30:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.