SGRAM: Improving Scene Graph Parsing via Abstract Meaning Representation
- URL: http://arxiv.org/abs/2210.08675v1
- Date: Mon, 17 Oct 2022 00:37:00 GMT
- Title: SGRAM: Improving Scene Graph Parsing via Abstract Meaning Representation
- Authors: Woo Suk Choi, Yu-Jung Heo and Byoung-Tak Zhang
- Abstract summary: Scene graph is structured semantic representation that can be modeled as a form of graph from images and texts.
In this paper, we focus on the problem of scene graph parsing from textual description of a visual scene.
We design a simple yet effective two-stage scene graph parsing framework utilizing abstract meaning representation.
- Score: 24.93559076181481
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene graph is structured semantic representation that can be modeled as a
form of graph from images and texts. Image-based scene graph generation
research has been actively conducted until recently, whereas text-based scene
graph generation research has not. In this paper, we focus on the problem of
scene graph parsing from textual description of a visual scene. The core idea
is to use abstract meaning representation (AMR) instead of the dependency
parsing mainly used in previous studies. AMR is a graph-based semantic
formalism of natural language which abstracts concepts of words in a sentence
contrary to the dependency parsing which considers dependency relationships on
all words in a sentence. To this end, we design a simple yet effective
two-stage scene graph parsing framework utilizing abstract meaning
representation, SGRAM (Scene GRaph parsing via Abstract Meaning
representation): 1) transforming a textual description of an image into an AMR
graph (Text-to-AMR) and 2) encoding the AMR graph into a Transformer-based
language model to generate a scene graph (AMR-to-SG). Experimental results show
the scene graphs generated by our framework outperforms the dependency
parsing-based model by 11.61\% and the previous state-of-the-art model using a
pre-trained Transformer language model by 3.78\%. Furthermore, we apply SGRAM
to image retrieval task which is one of downstream tasks for scene graph, and
confirm the effectiveness of scene graphs generated by our framework.
Related papers
- From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models [81.92098140232638]
Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for downstream reasoning tasks.
Existing methods struggle to generate scene graphs with novel visual relation concepts.
We introduce a new open-vocabulary SGG framework based on sequence generation.
arXiv Detail & Related papers (2024-04-01T04:21:01Z) - FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph
Parsing [66.70054075041487]
Existing scene graphs that convert image captions into scene graphs often suffer from two types of errors.
First, the generated scene graphs fail to capture the true semantics of the captions or the corresponding images, resulting in a lack of faithfulness.
Second, the generated scene graphs have high inconsistency, with the same semantics represented by different annotations.
arXiv Detail & Related papers (2023-05-27T15:38:31Z) - Coarse-to-Fine Contrastive Learning in Image-Text-Graph Space for
Improved Vision-Language Compositionality [50.48859793121308]
Contrastively trained vision-language models have achieved remarkable progress in vision and language representation learning.
Recent research has highlighted severe limitations in their ability to perform compositional reasoning over objects, attributes, and relations.
arXiv Detail & Related papers (2023-05-23T08:28:38Z) - SPAN: Learning Similarity between Scene Graphs and Images with Transformers [29.582313604112336]
We propose a Scene graPh-imAge coNtrastive learning framework, SPAN, that can measure the similarity between scene graphs and images.
We introduce a novel graph serialization technique that transforms a scene graph into a sequence with structural encodings.
arXiv Detail & Related papers (2023-04-02T18:13:36Z) - Visual Semantic Parsing: From Images to Abstract Meaning Representation [20.60579156219413]
We propose to leverage a widely-used meaning representation in the field of natural language processing, the Abstract Meaning Representation (AMR)
Our visual AMR graphs are more linguistically informed, with a focus on higher-level semantic concepts extrapolated from visual input.
Our findings point to important future research directions for improved scene understanding.
arXiv Detail & Related papers (2022-10-26T17:06:42Z) - Image Semantic Relation Generation [0.76146285961466]
Scene graphs can distil complex image information and correct the bias of visual models using semantic-level relations.
In this work, we introduce image semantic relation generation (ISRG), a simple but effective image-to-text model.
arXiv Detail & Related papers (2022-10-19T16:15:19Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z) - A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval [4.159666152160874]
Scene graph presentation is a suitable method for the image-text matching challenge.
We introduce the Local and Global Scene Graph Matching (LGSGM) model that enhances the state-of-the-art method.
Our enhancement with the combination of levels can improve the performance of the baseline method by increasing the recall by more than 10% on the Flickr30k dataset.
arXiv Detail & Related papers (2021-06-04T10:33:14Z) - Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation [98.34909905511061]
We argue that a desirable scene graph should be hierarchically constructed, and introduce a new scheme for modeling scene graph.
To generate a scene graph based on HET, we parse HET with a Hybrid Long Short-Term Memory (Hybrid-LSTM) which specifically encodes hierarchy and siblings context.
To further prioritize key relations in the scene graph, we devise a Relation Ranking Module (RRM) to dynamically adjust their rankings.
arXiv Detail & Related papers (2020-07-17T05:12:13Z) - Graph-Structured Referring Expression Reasoning in The Wild [105.95488002374158]
Grounding referring expressions aims to locate in an image an object referred to by a natural language expression.
We propose a scene graph guided modular network (SGMN) to perform reasoning over a semantic graph and a scene graph.
We also propose Ref-Reasoning, a large-scale real-world dataset for structured referring expression reasoning.
arXiv Detail & Related papers (2020-04-19T11:00:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.