Patch-wise Graph Contrastive Learning for Image Translation
- URL: http://arxiv.org/abs/2312.08223v2
- Date: Mon, 19 Feb 2024 23:39:30 GMT
- Title: Patch-wise Graph Contrastive Learning for Image Translation
- Authors: Chanyong Jung, Gihyun Kwon, Jong Chul Ye
- Abstract summary: We exploit the graph neural network to capture the topology-aware features.
We construct the graph based on the patch-wise similarity from a pretrained encoder.
In order to capture the hierarchical semantic structure, we propose the graph pooling.
- Score: 69.85040887753729
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, patch-wise contrastive learning is drawing attention for the image
translation by exploring the semantic correspondence between the input and
output images. To further explore the patch-wise topology for high-level
semantic understanding, here we exploit the graph neural network to capture the
topology-aware features. Specifically, we construct the graph based on the
patch-wise similarity from a pretrained encoder, whose adjacency matrix is
shared to enhance the consistency of patch-wise relation between the input and
the output. Then, we obtain the node feature from the graph neural network, and
enhance the correspondence between the nodes by increasing mutual information
using the contrastive loss. In order to capture the hierarchical semantic
structure, we further propose the graph pooling. Experimental results
demonstrate the state-of-art results for the image translation thanks to the
semantic encoding by the constructed graphs.
Related papers
- Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for
Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning.
Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z) - Graph Reasoning Transformer for Image Parsing [67.76633142645284]
We propose a novel Graph Reasoning Transformer (GReaT) for image parsing to enable image patches to interact following a relation reasoning pattern.
Compared to the conventional transformer, GReaT has higher interaction efficiency and a more purposeful interaction pattern.
Results show that GReaT achieves consistent performance gains with slight computational overheads on the state-of-the-art transformer baselines.
arXiv Detail & Related papers (2022-09-20T08:21:37Z) - Learning Hierarchical Graph Representation for Image Manipulation
Detection [50.04902159383709]
The objective of image manipulation detection is to identify and locate the manipulated regions in the images.
Recent approaches mostly adopt the sophisticated Convolutional Neural Networks (CNNs) to capture the tampering artifacts left in the images.
We propose a hierarchical Graph Convolutional Network (HGCN-Net), which consists of two parallel branches.
arXiv Detail & Related papers (2022-01-15T01:54:25Z) - Maximize the Exploration of Congeneric Semantics for Weakly Supervised
Semantic Segmentation [27.155133686127474]
We construct a graph neural network (P-GNN) based on the self-detected patches from different images that contain the same class labels.
We conduct experiments on the popular PASCAL VOC 2012 benchmarks, and our model yields state-of-the-art performance.
arXiv Detail & Related papers (2021-10-08T08:59:16Z) - Graph Representation Learning for Spatial Image Steganalysis [11.358487655918678]
We introduce a graph representation learning architecture for spatial image steganalysis.
In the detailed architecture, we translate each image to a graph, where nodes represent the patches of the image and edges indicate the local associations between the patches.
By feeding the graph to an attention network, the discriminative features can be learned for efficient steganalysis.
arXiv Detail & Related papers (2021-10-03T09:09:08Z) - Self-Supervised Graph Representation Learning via Topology
Transformations [61.870882736758624]
We present the Topology Transformation Equivariant Representation learning, a general paradigm of self-supervised learning for node representations of graph data.
In experiments, we apply the proposed model to the downstream node and graph classification tasks, and results show that the proposed method outperforms the state-of-the-art unsupervised approaches.
arXiv Detail & Related papers (2021-05-25T06:11:03Z) - Exploring Explicit and Implicit Visual Relationships for Image
Captioning [11.82805641934772]
In this paper, we explore explicit and implicit visual relationships to enrich region-level representations for image captioning.
Explicitly, we build semantic graph over object pairs and exploit gated graph convolutional networks (Gated GCN) to selectively aggregate local neighbors' information.
Implicitly, we draw global interactions among the detected objects through region-based bidirectional encoder representations from transformers.
arXiv Detail & Related papers (2021-05-06T01:47:51Z) - Scene Graph Embeddings Using Relative Similarity Supervision [4.137464623395376]
We employ a graph convolutional network to exploit structure in scene graphs and produce image embeddings useful for semantic image retrieval.
We propose a novel loss function that operates on pairs of similar and dissimilar images and imposes relative ordering between them in embedding space.
We demonstrate that this Ranking loss, coupled with an intuitive triple sampling strategy, leads to robust representations that outperform well-known contrastive losses on the retrieval task.
arXiv Detail & Related papers (2021-04-06T09:13:05Z) - Image-Graph-Image Translation via Auto-Encoding [4.847617604851614]
This work presents the first convolutional neural network that learns an image-to-graph translation task without needing external supervision.
We are the first to present a self-supervised approach based on a fully-differentiable auto-encoder in which the bottleneck encodes the graph's nodes and edges.
arXiv Detail & Related papers (2020-12-10T21:01:32Z) - Improving Image Captioning with Better Use of Captions [65.39641077768488]
We present a novel image captioning architecture to better explore semantics available in captions and leverage that to enhance both image representation and caption generation.
Our models first construct caption-guided visual relationship graphs that introduce beneficial inductive bias using weakly supervised multi-instance learning.
During generation, the model further incorporates visual relationships using multi-task learning for jointly predicting word and object/predicate tag sequences.
arXiv Detail & Related papers (2020-06-21T14:10:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.