Scene Graph Generation with Geometric Context
- URL: http://arxiv.org/abs/2111.13131v1
- Date: Thu, 25 Nov 2021 15:42:21 GMT
- Title: Scene Graph Generation with Geometric Context
- Authors: Vishal Kumar, Albert Mundu, Satish Kumar Singh
- Abstract summary: Scene graph, a visually grounded graphical structure of an image, immensely helps to simplify the image understanding tasks.
We introduce a post-processing algorithm called Geometric Context to understand the visual scenes better geometrically.
We exploit this context by calculating the direction and distance between object pairs.
- Score: 12.074766935042586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene Graph Generation has gained much attention in computer vision research
with the growing demand in image understanding projects like visual question
answering, image captioning, self-driving cars, crowd behavior analysis,
activity recognition, and more. Scene graph, a visually grounded graphical
structure of an image, immensely helps to simplify the image understanding
tasks. In this work, we introduced a post-processing algorithm called Geometric
Context to understand the visual scenes better geometrically. We use this
post-processing algorithm to add and refine the geometric relationships between
object pairs to a prior model. We exploit this context by calculating the
direction and distance between object pairs. We use Knowledge Embedded Routing
Network (KERN) as our baseline model, extend the work with our algorithm, and
show comparable results on the recent state-of-the-art algorithms.
Related papers
- From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models [81.92098140232638]
Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for downstream reasoning tasks.
Existing methods struggle to generate scene graphs with novel visual relation concepts.
We introduce a new open-vocabulary SGG framework based on sequence generation.
arXiv Detail & Related papers (2024-04-01T04:21:01Z) - Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for
Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning.
Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z) - Graph Neural Networks in Vision-Language Image Understanding: A Survey [6.813036707969848]
2D image understanding is a complex problem within computer vision.
It holds the key to providing human-level scene comprehension.
In recent years graph neural networks (GNNs) have become a standard component of many 2D image understanding pipelines.
arXiv Detail & Related papers (2023-03-07T09:56:23Z) - Image Semantic Relation Generation [0.76146285961466]
Scene graphs can distil complex image information and correct the bias of visual models using semantic-level relations.
In this work, we introduce image semantic relation generation (ISRG), a simple but effective image-to-text model.
arXiv Detail & Related papers (2022-10-19T16:15:19Z) - SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense
Reasoning [61.57887011165744]
multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning.
We propose a Scene Graph Enhanced Image-Text Learning framework to incorporate visual scene graphs in commonsense reasoning.
arXiv Detail & Related papers (2021-12-16T03:16:30Z) - Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question
Answering [13.886692497676659]
Graphhopper is a novel method that approaches the task by integrating knowledge graph reasoning, computer vision, and natural language processing techniques.
We derive a scene graph that describes the objects in the image, as well as their attributes and their mutual relationships.
A reinforcement learning agent is trained to autonomously navigate in a multi-hop manner over the extracted scene graph to generate reasoning paths.
arXiv Detail & Related papers (2021-07-13T18:33:04Z) - Scene Graph Reasoning for Visual Question Answering [23.57543808056452]
We propose a novel method that approaches the task by performing context-driven, sequential reasoning based on the objects and their semantic and spatial relationships present in the scene.
A reinforcement agent then learns to autonomously navigate over the extracted scene graph to generate paths, which are then the basis for deriving answers.
arXiv Detail & Related papers (2020-07-02T13:02:54Z) - Neural Topological SLAM for Visual Navigation [112.73876869904]
We design topological representations for space that leverage semantics and afford approximate geometric reasoning.
We describe supervised learning-based algorithms that can build, maintain and use such representations under noisy actuation.
arXiv Detail & Related papers (2020-05-25T17:56:29Z) - Learning 3D Semantic Scene Graphs from 3D Indoor Reconstructions [94.17683799712397]
We focus on scene graphs, a data structure that organizes the entities of a scene in a graph.
We propose a learned method that regresses a scene graph from the point cloud of a scene.
We show the application of our method in a domain-agnostic retrieval task, where graphs serve as an intermediate representation for 3D-3D and 2D-3D matching.
arXiv Detail & Related papers (2020-04-08T12:25:25Z) - Image Segmentation Using Deep Learning: A Survey [58.37211170954998]
Image segmentation is a key topic in image processing and computer vision.
There has been a substantial amount of works aimed at developing image segmentation approaches using deep learning models.
arXiv Detail & Related papers (2020-01-15T21:37:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.