Symbolic image detection using scene and knowledge graphs
- URL: http://arxiv.org/abs/2206.04863v1
- Date: Fri, 10 Jun 2022 04:06:28 GMT
- Title: Symbolic image detection using scene and knowledge graphs
- Authors: Nasrin Kalanat and Adriana Kovashka
- Abstract summary: We use a scene graph, a graph representation of an image, to capture visual components.
We generate a knowledge graph using facts extracted from ConceptNet to reason about objects and attributes.
We extend the network further to use an attention mechanism which learn the importance of the graph on representations.
- Score: 39.49756199669471
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sometimes the meaning conveyed by images goes beyond the list of objects they
contain; instead, images may express a powerful message to affect the viewers'
minds. Inferring this message requires reasoning about the relationships
between the objects, and general common-sense knowledge about the components.
In this paper, we use a scene graph, a graph representation of an image, to
capture visual components. In addition, we generate a knowledge graph using
facts extracted from ConceptNet to reason about objects and attributes. To
detect the symbols, we propose a neural network framework named SKG-Sym. The
framework first generates the representations of the scene graph of the image
and its knowledge graph using Graph Convolution Network. The framework then
fuses the representations and uses an MLP to classify them. We extend the
network further to use an attention mechanism which learn the importance of the
graph representations. We evaluate our methods on a dataset of advertisements,
and compare it with baseline symbolism classification methods (ResNet and VGG).
Results show that our methods outperform ResNet in terms of F-score and the
attention-based mechanism is competitive with VGG while it has much lower model
complexity.
Related papers
- Two Stream Scene Understanding on Graph Embedding [4.78180589767256]
The paper presents a novel two-stream network architecture for enhancing scene understanding in computer vision.
The graph feature stream network comprises a segmentation structure, scene graph generation, and a graph representation module.
Experiments conducted on the ADE20K dataset demonstrate the effectiveness of the proposed two-stream network in improving image classification accuracy.
arXiv Detail & Related papers (2023-11-12T05:57:56Z) - Unbiased Heterogeneous Scene Graph Generation with Relation-aware
Message Passing Neural Network [9.779600950401315]
We propose an unbiased heterogeneous scene graph generation (HetSGG) framework that captures relation-aware context.
We devise a novel message passing layer, called relation-aware message passing neural network (RMP), that aggregates the contextual information of an image.
arXiv Detail & Related papers (2022-12-01T11:25:36Z) - SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense
Reasoning [61.57887011165744]
multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning.
We propose a Scene Graph Enhanced Image-Text Learning framework to incorporate visual scene graphs in commonsense reasoning.
arXiv Detail & Related papers (2021-12-16T03:16:30Z) - Learning to Generate Scene Graph from Natural Language Supervision [52.18175340725455]
We propose one of the first methods that learn from image-sentence pairs to extract a graphical representation of localized objects and their relationships within an image, known as scene graph.
We leverage an off-the-shelf object detector to identify and localize object instances, match labels of detected regions to concepts parsed from captions, and thus create "pseudo" labels for learning scene graph.
arXiv Detail & Related papers (2021-09-06T03:38:52Z) - A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval [4.159666152160874]
Scene graph presentation is a suitable method for the image-text matching challenge.
We introduce the Local and Global Scene Graph Matching (LGSGM) model that enhances the state-of-the-art method.
Our enhancement with the combination of levels can improve the performance of the baseline method by increasing the recall by more than 10% on the Flickr30k dataset.
arXiv Detail & Related papers (2021-06-04T10:33:14Z) - Learning to Represent Image and Text with Denotation Graph [32.417311523031195]
We propose learning representations from a set of implied, visually grounded expressions between image and text.
We show that state-of-the-art multimodal learning models can be further improved by leveraging automatically harvested structural relations.
arXiv Detail & Related papers (2020-10-06T18:00:58Z) - GINet: Graph Interaction Network for Scene Parsing [58.394591509215005]
We propose a Graph Interaction unit (GI unit) and a Semantic Context Loss (SC-loss) to promote context reasoning over image regions.
The proposed GINet outperforms the state-of-the-art approaches on the popular benchmarks, including Pascal-Context and COCO Stuff.
arXiv Detail & Related papers (2020-09-14T02:52:45Z) - Learning Physical Graph Representations from Visual Scenes [56.7938395379406]
Physical Scene Graphs (PSGs) represent scenes as hierarchical graphs with nodes corresponding intuitively to object parts at different scales, and edges to physical connections between parts.
PSGNet augments standard CNNs by including: recurrent feedback connections to combine low and high-level image information; graph pooling and vectorization operations that convert spatially-uniform feature maps into object-centric graph structures.
We show that PSGNet outperforms alternative self-supervised scene representation algorithms at scene segmentation tasks.
arXiv Detail & Related papers (2020-06-22T16:10:26Z) - Bridging Knowledge Graphs to Generate Scene Graphs [49.69377653925448]
We propose a novel graph-based neural network that iteratively propagates information between the two graphs, as well as within each of them.
Our Graph Bridging Network, GB-Net, successively infers edges and nodes, allowing to simultaneously exploit and refine the rich, heterogeneous structure of the interconnected scene and commonsense graphs.
arXiv Detail & Related papers (2020-01-07T23:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.