Pre-training Graph Transformer with Multimodal Side Information for
Recommendation
- URL: http://arxiv.org/abs/2010.12284v2
- Date: Thu, 7 Jan 2021 10:39:57 GMT
- Title: Pre-training Graph Transformer with Multimodal Side Information for
Recommendation
- Authors: Yong Liu, Susen Yang, Chenyi Lei, Guoxin Wang, Haihong Tang, Juyong
Zhang, Aixin Sun, Chunyan Miao
- Abstract summary: We propose a pre-training strategy to learn item representations by considering both item side information and their relationships.
We develop a novel sampling algorithm named MCNSampling to select contextual neighbors for each item.
The proposed Pre-trained Multimodal Graph Transformer (PMGT) learns item representations with two objectives: 1) graph structure reconstruction, and 2) masked node feature reconstruction.
- Score: 82.4194024706817
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Side information of items, e.g., images and text description, has shown to be
effective in contributing to accurate recommendations. Inspired by the recent
success of pre-training models on natural language and images, we propose a
pre-training strategy to learn item representations by considering both item
side information and their relationships. We relate items by common user
activities, e.g., co-purchase, and construct a homogeneous item graph. This
graph provides a unified view of item relations and their associated side
information in multimodality. We develop a novel sampling algorithm named
MCNSampling to select contextual neighbors for each item. The proposed
Pre-trained Multimodal Graph Transformer (PMGT) learns item representations
with two objectives: 1) graph structure reconstruction, and 2) masked node
feature reconstruction. Experimental results on real datasets demonstrate that
the proposed PMGT model effectively exploits the multimodality side information
to achieve better accuracies in downstream tasks including item recommendation,
item classification, and click-through ratio prediction. We also report a case
study of testing the proposed PMGT model in an online setting with 600 thousand
users.
Related papers
- HeGraphAdapter: Tuning Multi-Modal Vision-Language Models with Heterogeneous Graph Adapter [19.557300178619382]
We propose a novel Heterogeneous Graph Adapter to achieve tuning VLMs for the downstream tasks.
We employ a specific Heterogeneous Graph Neural Network to excavate multi-modality structure knowledge for the downstream tasks.
Experimental results on 11 benchmark datasets demonstrate the effectiveness and benefits of the proposed HeGraphAdapter.
arXiv Detail & Related papers (2024-10-10T12:20:58Z) - Information Screening whilst Exploiting! Multimodal Relation Extraction
with Feature Denoising and Multimodal Topic Modeling [96.75821232222201]
Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation.
We propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting.
arXiv Detail & Related papers (2023-05-19T14:56:57Z) - Cross-view Graph Contrastive Representation Learning on Partially
Aligned Multi-view Data [52.491074276133325]
Multi-view representation learning has developed rapidly over the past decades and has been applied in many fields.
We propose a new cross-view graph contrastive learning framework, which integrates multi-view information to align data and learn latent representations.
Experiments conducted on several real datasets demonstrate the effectiveness of the proposed method on the clustering and classification tasks.
arXiv Detail & Related papers (2022-11-08T09:19:32Z) - MMGA: Multimodal Learning with Graph Alignment [8.349066399479938]
We propose MMGA, a novel multimodal pre-training framework to incorporate information from graph (social network), image and text modalities on social media.
In MMGA, a multi-step graph alignment mechanism is proposed to add the self-supervision from graph modality to optimize the image and text encoders.
We release our dataset, the first social media multimodal dataset with graph, of 60,000 users labeled with specific topics based on 2 million posts to facilitate future research.
arXiv Detail & Related papers (2022-10-18T15:50:31Z) - Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks.
We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z) - Mining Latent Structures for Multimedia Recommendation [46.70109406399858]
We propose a LATent sTructure mining method for multImodal reCommEndation, which we term LATTICE for brevity.
We learn item-item structures for each modality and aggregates multiple modalities to obtain latent item graphs.
Based on the learned latent graphs, we perform graph convolutions to explicitly inject high-order item affinities into item representations.
arXiv Detail & Related papers (2021-04-19T03:50:24Z) - Auto-weighted Multi-view Feature Selection with Graph Optimization [90.26124046530319]
We propose a novel unsupervised multi-view feature selection model based on graph learning.
The contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned.
Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
arXiv Detail & Related papers (2021-04-11T03:25:25Z) - Joint Item Recommendation and Attribute Inference: An Adaptive Graph
Convolutional Network Approach [61.2786065744784]
In recommender systems, users and items are associated with attributes, and users show preferences to items.
As annotating user (item) attributes is a labor intensive task, the attribute values are often incomplete with many missing attribute values.
We propose an Adaptive Graph Convolutional Network (AGCN) approach for joint item recommendation and attribute inference.
arXiv Detail & Related papers (2020-05-25T10:50:01Z) - Rich-Item Recommendations for Rich-Users: Exploiting Dynamic and Static
Side Information [20.176329366180934]
We study the problem of recommendation system where the users and items to be recommended are rich data structures with multiple entity types.
We provide a general formulation for the problem that captures the complexities of modern real-world recommendations.
We present two real-world case studies of our formulation and the MEDRES architecture.
arXiv Detail & Related papers (2020-01-28T17:53:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.