MM-GEF: Multi-modal representation meet collaborative filtering
- URL: http://arxiv.org/abs/2308.07222v1
- Date: Mon, 14 Aug 2023 15:47:36 GMT
- Title: MM-GEF: Multi-modal representation meet collaborative filtering
- Authors: Hao Wu and Alejandro Ariza-Casabona and Bart{\l}omiej Twardowski and
Tri Kurniawan Wijaya
- Abstract summary: We propose a graph-based item structure enhancement method MM-GEF: Multi-Modal recommendation with Graph Early-Fusion.
MM-GEF learns refined item representations by injecting structural information obtained from both multi-modal and collaborative signals.
- Score: 51.04679619309803
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In modern e-commerce, item content features in various modalities offer
accurate yet comprehensive information to recommender systems. The majority of
previous work either focuses on learning effective item representation during
modelling user-item interactions, or exploring item-item relationships by
analysing multi-modal features. Those methods, however, fail to incorporate the
collaborative item-user-item relationships into the multi-modal feature-based
item structure. In this work, we propose a graph-based item structure
enhancement method MM-GEF: Multi-Modal recommendation with Graph Early-Fusion,
which effectively combines the latent item structure underlying multi-modal
contents with the collaborative signals. Instead of processing the content
feature in different modalities separately, we show that the early-fusion of
multi-modal features provides significant improvement. MM-GEF learns refined
item representations by injecting structural information obtained from both
multi-modal and collaborative signals. Through extensive experiments on four
publicly available datasets, we demonstrate systematical improvements of our
method over state-of-the-art multi-modal recommendation methods.
Related papers
- Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts [54.529880848937104]
We develop a unified MLLM with the MoE architecture, named Uni-MoE, that can handle a wide array of modalities.
Specifically, it features modality-specific encoders with connectors for a unified multimodal representation.
We evaluate the instruction-tuned Uni-MoE on a comprehensive set of multimodal datasets.
arXiv Detail & Related papers (2024-05-18T12:16:01Z) - Multi-modal Semantic Understanding with Contrastive Cross-modal Feature
Alignment [11.897888221717245]
This paper proposes a novel CLIP-guided contrastive-learning-based architecture to perform multi-modal feature alignment.
Our model is simple to implement without using task-specific external knowledge, and thus can easily migrate to other multi-modal tasks.
arXiv Detail & Related papers (2024-03-11T01:07:36Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product
Summarization [93.5217515566437]
Multi-modal Product Summarization (MPS) aims to increase customers' desire to purchase by highlighting product characteristics.
Existing MPS methods can produce promising results, but they still lack end-to-end product summarization.
We propose an end-to-end multi-modal attribute-aware product summarization method (MMAPS) for generating high-quality product summaries in e-commerce.
arXiv Detail & Related papers (2023-08-22T11:00:09Z) - Step fusion: Local and global mutual guidance [3.0903319879656084]
We propose a feature alignment method that fully fuses multimodal information, which stepwise shifts and expands feature information from different modalities to have a consistent representation in a feature space.
The proposed method can robustly capture high-level interactions between features of different modalities, thus significantly improving the performance of multimodal learning.
arXiv Detail & Related papers (2023-06-29T13:49:06Z) - Multi-modal Contrastive Representation Learning for Entity Alignment [57.92705405276161]
Multi-modal entity alignment aims to identify equivalent entities between two different multi-modal knowledge graphs.
We propose MCLEA, a Multi-modal Contrastive Learning based Entity Alignment model.
In particular, MCLEA firstly learns multiple individual representations from multiple modalities, and then performs contrastive learning to jointly model intra-modal and inter-modal interactions.
arXiv Detail & Related papers (2022-09-02T08:59:57Z) - Multimodal E-Commerce Product Classification Using Hierarchical Fusion [0.0]
The proposed method significantly outperformed the unimodal models' performance and the reported performance of similar models on our specific task.
We did experiments with multiple fusing techniques and found, that the best performing technique to combine the individual embedding of the unimodal network is based on combining concatenation and averaging the feature vectors.
arXiv Detail & Related papers (2022-07-07T14:04:42Z) - Latent Structures Mining with Contrastive Modality Fusion for Multimedia
Recommendation [22.701371886522494]
We argue that the latent semantic item-item structures underlying multimodal contents could be beneficial for learning better item representations.
We devise a novel modality-aware structure learning module, which learns item-item relationships for each modality.
arXiv Detail & Related papers (2021-11-01T03:37:02Z) - Mining Latent Structures for Multimedia Recommendation [46.70109406399858]
We propose a LATent sTructure mining method for multImodal reCommEndation, which we term LATTICE for brevity.
We learn item-item structures for each modality and aggregates multiple modalities to obtain latent item graphs.
Based on the learned latent graphs, we perform graph convolutions to explicitly inject high-order item affinities into item representations.
arXiv Detail & Related papers (2021-04-19T03:50:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.