Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation
- URL: http://arxiv.org/abs/2007.08760v1
- Date: Fri, 17 Jul 2020 05:12:13 GMT
- Title: Sketching Image Gist: Human-Mimetic Hierarchical Scene Graph Generation
- Authors: Wenbin Wang, Ruiping Wang, Shiguang Shan, Xilin Chen
- Abstract summary: We argue that a desirable scene graph should be hierarchically constructed, and introduce a new scheme for modeling scene graph.
To generate a scene graph based on HET, we parse HET with a Hybrid Long Short-Term Memory (Hybrid-LSTM) which specifically encodes hierarchy and siblings context.
To further prioritize key relations in the scene graph, we devise a Relation Ranking Module (RRM) to dynamically adjust their rankings.
- Score: 98.34909905511061
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scene graph aims to faithfully reveal humans' perception of image content.
When humans analyze a scene, they usually prefer to describe image gist first,
namely major objects and key relations in a scene graph. This humans' inherent
perceptive habit implies that there exists a hierarchical structure about
humans' preference during the scene parsing procedure. Therefore, we argue that
a desirable scene graph should be also hierarchically constructed, and
introduce a new scheme for modeling scene graph. Concretely, a scene is
represented by a human-mimetic Hierarchical Entity Tree (HET) consisting of a
series of image regions. To generate a scene graph based on HET, we parse HET
with a Hybrid Long Short-Term Memory (Hybrid-LSTM) which specifically encodes
hierarchy and siblings context to capture the structured information embedded
in HET. To further prioritize key relations in the scene graph, we devise a
Relation Ranking Module (RRM) to dynamically adjust their rankings by learning
to capture humans' subjective perceptive habits from objective entity saliency
and size. Experiments indicate that our method not only achieves
state-of-the-art performances for scene graph generation, but also is expert in
mining image-specific relations which play a great role in serving downstream
tasks.
Related papers
- SPAN: Learning Similarity between Scene Graphs and Images with Transformers [29.582313604112336]
We propose a Scene graPh-imAge coNtrastive learning framework, SPAN, that can measure the similarity between scene graphs and images.
We introduce a novel graph serialization technique that transforms a scene graph into a sequence with structural encodings.
arXiv Detail & Related papers (2023-04-02T18:13:36Z) - Diffusion-Based Scene Graph to Image Generation with Masked Contrastive
Pre-Training [112.94542676251133]
We propose to learn scene graph embeddings by directly optimizing their alignment with images.
Specifically, we pre-train an encoder to extract both global and local information from scene graphs.
The resulting method, called SGDiff, allows for the semantic manipulation of generated images by modifying scene graph nodes and connections.
arXiv Detail & Related papers (2022-11-21T01:11:19Z) - Image Semantic Relation Generation [0.76146285961466]
Scene graphs can distil complex image information and correct the bias of visual models using semantic-level relations.
In this work, we introduce image semantic relation generation (ISRG), a simple but effective image-to-text model.
arXiv Detail & Related papers (2022-10-19T16:15:19Z) - Scene Graph Modification as Incremental Structure Expanding [61.84291817776118]
We focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query.
We frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE)
We construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets.
arXiv Detail & Related papers (2022-09-15T16:26:14Z) - Scene Graph Generation for Better Image Captioning? [48.411957217304]
We propose a model that leverages detected objects and auto-generated visual relationships to describe images in natural language.
We generate a scene graph from raw image pixels by identifying individual objects and visual relationships between them.
This scene graph then serves as input to our graph-to-text model, which generates the final caption.
arXiv Detail & Related papers (2021-09-23T14:35:11Z) - Unconditional Scene Graph Generation [72.53624470737712]
We develop a deep auto-regressive model called SceneGraphGen which can learn the probability distribution over labelled and directed graphs.
We show that the scene graphs generated by SceneGraphGen are diverse and follow the semantic patterns of real-world scenes.
arXiv Detail & Related papers (2021-08-12T17:57:16Z) - Enhancing Social Relation Inference with Concise Interaction Graph and
Discriminative Scene Representation [56.25878966006678]
We propose an approach of textbfPRactical textbfInference in textbfSocial rtextbfElation (PRISE)
It concisely learns interactive features of persons and discriminative features of holistic scenes.
PRISE achieves 6.8$%$ improvement for domain classification in PIPA dataset.
arXiv Detail & Related papers (2021-07-30T04:20:13Z) - A Comprehensive Survey of Scene Graphs: Generation and Application [42.07469181785126]
Scene graph is a structured representation of a scene that can clearly express the objects, attributes, and relationships between objects in the scene.
No relatively systematic survey of scene graphs exists at present.
arXiv Detail & Related papers (2021-03-17T04:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.