Graph Pattern Loss based Diversified Attention Network for Cross-Modal
Retrieval
- URL: http://arxiv.org/abs/2106.13552v1
- Date: Fri, 25 Jun 2021 10:53:07 GMT
- Title: Graph Pattern Loss based Diversified Attention Network for Cross-Modal
Retrieval
- Authors: Xueying Chen, Rong Zhang, Yibing Zhan
- Abstract summary: Cross-modal retrieval aims to enable flexible retrieval experience by combining multimedia data such as image, video, text, and audio.
One core of unsupervised approaches is to dig the correlations among different object representations to complete satisfied retrieval performance without requiring expensive labels.
We propose a Graph Pattern Loss based Diversified Attention Network(GPLDAN) for unsupervised cross-modal retrieval.
- Score: 10.420129873840578
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-modal retrieval aims to enable flexible retrieval experience by
combining multimedia data such as image, video, text, and audio. One core of
unsupervised approaches is to dig the correlations among different object
representations to complete satisfied retrieval performance without requiring
expensive labels. In this paper, we propose a Graph Pattern Loss based
Diversified Attention Network(GPLDAN) for unsupervised cross-modal retrieval to
deeply analyze correlations among representations. First, we propose a
diversified attention feature projector by considering the interaction between
different representations to generate multiple representations of an instance.
Then, we design a novel graph pattern loss to explore the correlations among
different representations, in this graph all possible distances between
different representations are considered. In addition, a modality classifier is
added to explicitly declare the corresponding modalities of features before
fusion and guide the network to enhance discrimination ability. We test GPLDAN
on four public datasets. Compared with the state-of-the-art cross-modal
retrieval methods, the experimental results demonstrate the performance and
competitiveness of GPLDAN.
Related papers
- Scalable Weibull Graph Attention Autoencoder for Modeling Document Networks [50.42343781348247]
We develop a graph Poisson factor analysis (GPFA) which provides analytic conditional posteriors to improve the inference accuracy.
We also extend GPFA to a multi-stochastic-layer version named graph Poisson gamma belief network (GPGBN) to capture the hierarchical document relationships at multiple semantic levels.
Our models can extract high-quality hierarchical latent document representations and achieve promising performance on various graph analytic tasks.
arXiv Detail & Related papers (2024-10-13T02:22:14Z) - Visual Commonsense based Heterogeneous Graph Contrastive Learning [79.22206720896664]
We propose a heterogeneous graph contrastive learning method to better finish the visual reasoning task.
Our method is designed as a plug-and-play way, so that it can be quickly and easily combined with a wide range of representative methods.
arXiv Detail & Related papers (2023-11-11T12:01:18Z) - Entropy Neural Estimation for Graph Contrastive Learning [9.032721248598088]
Contrastive learning on graphs aims at extracting distinguishable high-level representations of nodes.
We propose a simple yet effective subset sampling strategy to contrast pairwise representations between views of a dataset.
We conduct extensive experiments on seven graph benchmarks, and the proposed approach achieves competitive performance.
arXiv Detail & Related papers (2023-07-26T03:55:08Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Scale-Semantic Joint Decoupling Network for Image-text Retrieval in
Remote Sensing [23.598273691455503]
We propose a novel Scale-Semantic Joint Decoupling Network (SJDN) for remote sensing image-text retrieval.
Our proposed SSJDN outperforms state-of-the-art approaches in numerical experiments conducted on four benchmark remote sensing datasets.
arXiv Detail & Related papers (2022-12-12T08:02:35Z) - Visually-aware Acoustic Event Detection using Heterogeneous Graphs [39.90352230010103]
Perception of auditory events is inherently multimodal relying on both audio and visual cues.
We employ heterogeneous graphs to capture the spatial and temporal relationships between the modalities.
We show efficiently modelling of intra- and inter-modality relationships both at spatial and temporal scales.
arXiv Detail & Related papers (2022-07-16T13:09:25Z) - Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
Retrieval [152.3504607706575]
This research aims to conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
We first contribute the Product1M datasets, and define two real practical instance-level retrieval tasks.
We exploit to train a more effective cross-modal model which is adaptively capable of incorporating key concept information from the multi-modal data.
arXiv Detail & Related papers (2022-06-17T15:40:45Z) - ACTIVE:Augmentation-Free Graph Contrastive Learning for Partial
Multi-View Clustering [52.491074276133325]
We propose an augmentation-free graph contrastive learning framework to solve the problem of partial multi-view clustering.
The proposed approach elevates instance-level contrastive learning and missing data inference to the cluster-level, effectively mitigating the impact of individual missing data on clustering.
arXiv Detail & Related papers (2022-03-01T02:32:25Z) - r-GAT: Relational Graph Attention Network for Multi-Relational Graphs [8.529080554172692]
Graph Attention Network (GAT) focuses on modelling simple undirected and single relational graph data only.
We propose r-GAT, a relational graph attention network to learn multi-channel entity representations.
Experiments on link prediction and entity classification tasks show that our r-GAT can model multi-relational graphs effectively.
arXiv Detail & Related papers (2021-09-13T12:43:00Z) - When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs)
In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability.
Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.