Graph Structured Network for Image-Text Matching
- URL: http://arxiv.org/abs/2004.00277v1
- Date: Wed, 1 Apr 2020 08:20:42 GMT
- Title: Graph Structured Network for Image-Text Matching
- Authors: Chunxiao Liu, Zhendong Mao, Tianzhu Zhang, Hongtao Xie, Bin Wang,
Yongdong Zhang
- Abstract summary: We present a novel Graph Structured Matching Network to learn fine-grained correspondence.
The GSMN explicitly models object, relation and attribute as a structured phrase.
Experiments show that GSMN outperforms state-of-the-art methods on benchmarks.
- Score: 127.68148793548116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-text matching has received growing interest since it bridges vision and
language. The key challenge lies in how to learn correspondence between image
and text. Existing works learn coarse correspondence based on object
co-occurrence statistics, while failing to learn fine-grained phrase
correspondence. In this paper, we present a novel Graph Structured Matching
Network (GSMN) to learn fine-grained correspondence. The GSMN explicitly models
object, relation and attribute as a structured phrase, which not only allows to
learn correspondence of object, relation and attribute separately, but also
benefits to learn fine-grained correspondence of structured phrase. This is
achieved by node-level matching and structure-level matching. The node-level
matching associates each node with its relevant nodes from another modality,
where the node can be object, relation or attribute. The associated nodes then
jointly infer fine-grained correspondence by fusing neighborhood associations
at structure-level matching. Comprehensive experiments show that GSMN
outperforms state-of-the-art methods on benchmarks, with relative Recall@1
improvements of nearly 7% and 2% on Flickr30K and MSCOCO, respectively. Code
will be released at: https://github.com/CrossmodalGroup/GSMN.
Related papers
- Graph Neural Networks on Discriminative Graphs of Words [19.817473565906777]
In this work, we explore a new Discriminative Graph of Words Graph Neural Network (DGoW-GNN) approach to classify text.
We propose a new model for the graph-based classification of text, which combines a GNN and a sequence model.
We evaluate our approach on seven benchmark datasets and find that it is outperformed by several state-of-the-art baseline models.
arXiv Detail & Related papers (2024-10-27T15:14:06Z) - Relation Rectification in Diffusion Model [64.84686527988809]
We introduce a novel task termed Relation Rectification, aiming to refine the model to accurately represent a given relationship it initially fails to generate.
We propose an innovative solution utilizing a Heterogeneous Graph Convolutional Network (HGCN)
The lightweight HGCN adjusts the text embeddings generated by the text encoder, ensuring the accurate reflection of the textual relation in the embedding space.
arXiv Detail & Related papers (2024-03-29T15:54:36Z) - EntailE: Introducing Textual Entailment in Commonsense Knowledge Graph
Completion [54.12709176438264]
Commonsense knowledge graphs (CSKGs) utilize free-form text to represent named entities, short phrases, and events as their nodes.
Current methods leverage semantic similarities to increase the graph density, but the semantic plausibility of the nodes and their relations are under-explored.
We propose to adopt textual entailment to find implicit entailment relations between CSKG nodes, to effectively densify the subgraph connecting nodes within the same conceptual class.
arXiv Detail & Related papers (2024-02-15T02:27:23Z) - Pretraining Language Models with Text-Attributed Heterogeneous Graphs [28.579509154284448]
We present a new pretraining framework for Language Models (LMs) that explicitly considers the topological and heterogeneous information in Text-Attributed Heterogeneous Graphs (TAHGs)
We propose a topology-aware pretraining task to predict nodes involved in the context graph by jointly optimizing an LM and an auxiliary heterogeneous graph neural network.
We conduct link prediction and node classification tasks on three datasets from various domains.
arXiv Detail & Related papers (2023-10-19T08:41:21Z) - ConGraT: Self-Supervised Contrastive Pretraining for Joint Graph and Text Embeddings [20.25180279903009]
We propose Contrastive Graph-Text pretraining (ConGraT) for jointly learning separate representations of texts and nodes in a text-attributed graph (TAG)
Our method trains a language model (LM) and a graph neural network (GNN) to align their representations in a common latent space using a batch-wise contrastive learning objective inspired by CLIP.
Experiments demonstrate that ConGraT outperforms baselines on various downstream tasks, including node and text category classification, link prediction, and language modeling.
arXiv Detail & Related papers (2023-05-23T17:53:30Z) - PA-GM: Position-Aware Learning of Embedding Networks for Deep Graph
Matching [14.713628231555223]
We introduce a novel end-to-end neural network that can map the linear assignment problem into a high-dimensional space.
Our model constructs the anchor set for the relative position of nodes.
It then aggregates the feature information of the target node and each anchor node based on a measure of relative position.
arXiv Detail & Related papers (2023-01-05T06:54:21Z) - Joint Graph Learning and Matching for Semantic Feature Correspondence [69.71998282148762]
We propose a joint emphgraph learning and matching network, named GLAM, to explore reliable graph structures for boosting graph matching.
The proposed method is evaluated on three popular visual matching benchmarks (Pascal VOC, Willow Object and SPair-71k)
It outperforms previous state-of-the-art graph matching methods by significant margins on all benchmarks.
arXiv Detail & Related papers (2021-09-01T08:24:02Z) - Iterative Context-Aware Graph Inference for Visual Dialog [126.016187323249]
We propose a novel Context-Aware Graph (CAG) neural network.
Each node in the graph corresponds to a joint semantic feature, including both object-based (visual) and history-related (textual) context representations.
arXiv Detail & Related papers (2020-04-05T13:09:37Z) - Expressing Objects just like Words: Recurrent Visual Embedding for
Image-Text Matching [102.62343739435289]
Existing image-text matching approaches infer the similarity of an image-text pair by capturing and aggregating the affinities between the text and each independent object of the image.
We propose a Dual Path Recurrent Neural Network (DP-RNN) which processes images and sentences symmetrically by recurrent neural networks (RNN)
Our model achieves the state-of-the-art performance on Flickr30K dataset and competitive performance on MS-COCO dataset.
arXiv Detail & Related papers (2020-02-20T00:51:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.