Image-Graph-Image Translation via Auto-Encoding
- URL: http://arxiv.org/abs/2012.05975v1
- Date: Thu, 10 Dec 2020 21:01:32 GMT
- Title: Image-Graph-Image Translation via Auto-Encoding
- Authors: Chenyang Lu and Gijs Dubbelman
- Abstract summary: This work presents the first convolutional neural network that learns an image-to-graph translation task without needing external supervision.
We are the first to present a self-supervised approach based on a fully-differentiable auto-encoder in which the bottleneck encodes the graph's nodes and edges.
- Score: 4.847617604851614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work presents the first convolutional neural network that learns an
image-to-graph translation task without needing external supervision. Obtaining
graph representations of image content, where objects are represented as nodes
and their relationships as edges, is an important task in scene understanding.
Current approaches follow a fully-supervised approach thereby requiring
meticulous annotations. To overcome this, we are the first to present a
self-supervised approach based on a fully-differentiable auto-encoder in which
the bottleneck encodes the graph's nodes and edges. This self-supervised
approach can currently encode simple line drawings into graphs and obtains
comparable results to a fully-supervised baseline in terms of F1 score on
triplet matching. Besides these promising results, we provide several
directions for future research on how our approach can be extended to cover
more complex imagery.
Related papers
- InstructG2I: Synthesizing Images from Multimodal Attributed Graphs [50.852150521561676]
We propose a graph context-conditioned diffusion model called InstructG2I.
InstructG2I first exploits the graph structure and multimodal information to conduct informative neighbor sampling.
A Graph-QFormer encoder adaptively encodes the graph nodes into an auxiliary set of graph prompts to guide the denoising process.
arXiv Detail & Related papers (2024-10-09T17:56:15Z) - Composing Object Relations and Attributes for Image-Text Matching [70.47747937665987]
This work introduces a dual-encoder image-text matching model, leveraging a scene graph to represent captions with nodes for objects and attributes interconnected by relational edges.
Our model efficiently encodes object-attribute and object-object semantic relations, resulting in a robust and fast-performing system.
arXiv Detail & Related papers (2024-06-17T17:56:01Z) - Graph Context Transformation Learning for Progressive Correspondence
Pruning [26.400567961735234]
We propose Graph Context Transformation Network (GCT-Net) enhancing context information to conduct consensus guidance for progressive correspondence pruning.
Specifically, we design the Graph Context Enhance Transformer which first generates the graph network and then transforms it into multi-branch graph contexts.
To further apply the recalibrated graph contexts to the global domain, we propose the Graph Context Guidance Transformer.
arXiv Detail & Related papers (2023-12-26T09:43:30Z) - Patch-wise Graph Contrastive Learning for Image Translation [69.85040887753729]
We exploit the graph neural network to capture the topology-aware features.
We construct the graph based on the patch-wise similarity from a pretrained encoder.
In order to capture the hierarchical semantic structure, we propose the graph pooling.
arXiv Detail & Related papers (2023-12-13T15:45:19Z) - SelfGraphVQA: A Self-Supervised Graph Neural Network for Scene-based
Question Answering [0.0]
Scene graphs have emerged as a useful tool for multimodal image analysis.
Current methods that utilize idealized annotated scene graphs struggle to generalize when using predicted scene graphs extracted from images.
Our approach extracts a scene graph from an input image using a pre-trained scene graph generator.
arXiv Detail & Related papers (2023-10-03T07:14:53Z) - A Graph-Matching Approach for Cross-view Registration of Over-view 2 and
Street-view based Point Clouds [4.742825811314168]
We propose a fully automated geo-registration method for cross-view data, which utilizes semantically segmented object boundaries as view-invariant features.
The proposed method models segments of buildings as nodes of graphs, both detected from the satellite-based and street-view based point clouds.
The matched nodes will be subject to a further optimization to allow precise-registration, followed by a constrained bundle adjustment on the street-view image to keep 2D29 3D consistencies.
arXiv Detail & Related papers (2022-02-14T16:43:28Z) - Augmented Abstractive Summarization With Document-LevelSemantic Graph [3.0272794341021667]
Previous abstractive methods apply sequence-to-sequence structures to generate summary without a module.
We utilize semantic graph to boost the generation performance.
A novel neural decoder is presented to leverage the information of such entity graphs.
arXiv Detail & Related papers (2021-09-13T15:12:34Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z) - Semantic Segmentation with Generative Models: Semi-Supervised Learning
and Strong Out-of-Domain Generalization [112.68171734288237]
We propose a novel framework for discriminative pixel-level tasks using a generative model of both images and labels.
We learn a generative adversarial network that captures the joint image-label distribution and is trained efficiently using a large set of unlabeled images.
We demonstrate strong in-domain performance compared to several baselines, and are the first to showcase extreme out-of-domain generalization.
arXiv Detail & Related papers (2021-04-12T21:41:25Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.