Incorporating Domain Knowledge Graph into Multimodal Movie Genre
Classification with Self-Supervised Attention and Contrastive Learning
- URL: http://arxiv.org/abs/2310.08032v1
- Date: Thu, 12 Oct 2023 04:49:11 GMT
- Title: Incorporating Domain Knowledge Graph into Multimodal Movie Genre
Classification with Self-Supervised Attention and Contrastive Learning
- Authors: Jiaqi Li, Guilin Qi, Chuanyi Zhang, Yongrui Chen, Yiming Tan, Chenlong
Xia, Ye Tian
- Abstract summary: We present a novel framework that exploits the knowledge graph from various perspectives to address the above problems.
We introduce an Attention Teacher module for reliable attention allocation based on self-supervised learning.
Finally, a Genre-Centroid Anchored Contrastive Learning module is proposed to strengthen the discriminative ability of fused features.
- Score: 14.729059909487072
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal movie genre classification has always been regarded as a demanding
multi-label classification task due to the diversity of multimodal data such as
posters, plot summaries, trailers and metadata. Although existing works have
made great progress in modeling and combining each modality, they still face
three issues: 1) unutilized group relations in metadata, 2) unreliable
attention allocation, and 3) indiscriminative fused features. Given that the
knowledge graph has been proven to contain rich information, we present a novel
framework that exploits the knowledge graph from various perspectives to
address the above problems. As a preparation, the metadata is processed into a
domain knowledge graph. A translate model for knowledge graph embedding is
adopted to capture the relations between entities. Firstly we retrieve the
relevant embedding from the knowledge graph by utilizing group relations in
metadata and then integrate it with other modalities. Next, we introduce an
Attention Teacher module for reliable attention allocation based on
self-supervised learning. It learns the distribution of the knowledge graph and
produces rational attention weights. Finally, a Genre-Centroid Anchored
Contrastive Learning module is proposed to strengthen the discriminative
ability of fused features. The embedding space of anchors is initialized from
the genre entities in the knowledge graph. To verify the effectiveness of our
framework, we collect a larger and more challenging dataset named MM-IMDb 2.0
compared with the MM-IMDb dataset. The experimental results on two datasets
demonstrate that our model is superior to the state-of-the-art methods. We will
release the code in the near future.
Related papers
- DisenSemi: Semi-supervised Graph Classification via Disentangled Representation Learning [36.85439684013268]
We propose a novel framework named DisenSemi, which learns disentangled representation for semi-supervised graph classification.
Specifically, a disentangled graph encoder is proposed to generate factor-wise graph representations for both supervised and unsupervised models.
We train two models via supervised objective and mutual information (MI)-based constraints respectively.
arXiv Detail & Related papers (2024-07-19T07:31:32Z) - MyGO: Discrete Modality Information as Fine-Grained Tokens for Multi-modal Knowledge Graph Completion [51.80447197290866]
We introduce MyGO to process, fuse, and augment the fine-grained modality information from MMKGs.
MyGO tokenizes multi-modal raw data as fine-grained discrete tokens and learns entity representations with a cross-modal entity encoder.
Experiments on standard MMKGC benchmarks reveal that our method surpasses 20 of the latest models.
arXiv Detail & Related papers (2024-04-15T05:40:41Z) - NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - Recognizing Unseen Objects via Multimodal Intensive Knowledge Graph
Propagation [68.13453771001522]
We propose a multimodal intensive ZSL framework that matches regions of images with corresponding semantic embeddings.
We conduct extensive experiments and evaluate our model on large-scale real-world data.
arXiv Detail & Related papers (2023-06-14T13:07:48Z) - Deep Relational Metric Learning [84.95793654872399]
This paper presents a deep relational metric learning framework for image clustering and retrieval.
We learn an ensemble of features that characterizes an image from different aspects to model both interclass and intraclass distributions.
Experiments on the widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate that our framework improves existing deep metric learning methods and achieves very competitive results.
arXiv Detail & Related papers (2021-08-23T09:31:18Z) - Group Contrastive Self-Supervised Learning on Graphs [101.45974132613293]
We study self-supervised learning on graphs using contrastive methods.
We argue that contrasting graphs in multiple subspaces enables graph encoders to capture more abundant characteristics.
arXiv Detail & Related papers (2021-07-20T22:09:21Z) - GCNBoost: Artwork Classification by Label Propagation through a
Knowledge Graph [32.129005474301735]
Contextual information is often the key to structure such real world data, and we propose to use it in form of a knowledge graph.
We propose a novel use of a knowledge graph, that is constructed on annotated data and pseudo-labeled data.
With label propagation, we boost artwork classification by training a model using a graph convolutional network.
arXiv Detail & Related papers (2021-05-25T11:50:05Z) - An Adversarial Transfer Network for Knowledge Representation Learning [11.013390624382257]
We propose an adversarial embedding transfer network ATransN, which transfers knowledge from one or more teacher knowledge graphs to a target one.
Specifically, we add soft constraints on aligned entity pairs and neighbours to the existing knowledge representation learning methods.
arXiv Detail & Related papers (2021-04-30T05:07:25Z) - Mutual Graph Learning for Camouflaged Object Detection [31.422775969808434]
A major challenge is that intrinsic similarities between foreground objects and background surroundings make the features extracted by deep model indistinguishable.
We design a novel Mutual Graph Learning model, which generalizes the idea of conventional mutual learning from regular grids to the graph domain.
In contrast to most mutual learning approaches that use a shared function to model all between-task interactions, MGL is equipped with typed functions for handling different complementary relations.
arXiv Detail & Related papers (2021-04-03T10:14:39Z) - Dual-Teacher: Integrating Intra-domain and Inter-domain Teachers for
Annotation-efficient Cardiac Segmentation [65.81546955181781]
We propose a novel semi-supervised domain adaptation approach, namely Dual-Teacher.
The student model learns the knowledge of unlabeled target data and labeled source data by two teacher models.
We demonstrate that our approach is able to concurrently utilize unlabeled data and cross-modality data with superior performance.
arXiv Detail & Related papers (2020-07-13T10:00:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.