Two Stream Scene Understanding on Graph Embedding
- URL: http://arxiv.org/abs/2311.06746v1
- Date: Sun, 12 Nov 2023 05:57:56 GMT
- Title: Two Stream Scene Understanding on Graph Embedding
- Authors: Wenkai Yang, Wenyuan Sun, Runxaing Huang
- Abstract summary: The paper presents a novel two-stream network architecture for enhancing scene understanding in computer vision.
The graph feature stream network comprises a segmentation structure, scene graph generation, and a graph representation module.
Experiments conducted on the ADE20K dataset demonstrate the effectiveness of the proposed two-stream network in improving image classification accuracy.
- Score: 4.78180589767256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The paper presents a novel two-stream network architecture for enhancing
scene understanding in computer vision. This architecture utilizes a graph
feature stream and an image feature stream, aiming to merge the strengths of
both modalities for improved performance in image classification and scene
graph generation tasks. The graph feature stream network comprises a
segmentation structure, scene graph generation, and a graph representation
module. The segmentation structure employs the UPSNet architecture with a
backbone that can be a residual network, Vit, or Swin Transformer. The scene
graph generation component focuses on extracting object labels and neighborhood
relationships from the semantic map to create a scene graph. Graph
Convolutional Networks (GCN), GraphSAGE, and Graph Attention Networks (GAT) are
employed for graph representation, with an emphasis on capturing node features
and their interconnections. The image feature stream network, on the other
hand, focuses on image classification through the use of Vision Transformer and
Swin Transformer models. The two streams are fused using various data fusion
methods. This fusion is designed to leverage the complementary strengths of
graph-based and image-based features.Experiments conducted on the ADE20K
dataset demonstrate the effectiveness of the proposed two-stream network in
improving image classification accuracy compared to conventional methods. This
research provides a significant contribution to the field of computer vision,
particularly in the areas of scene understanding and image classification, by
effectively combining graph-based and image-based approaches.
Related papers
- Graph Transformer GANs with Graph Masked Modeling for Architectural
Layout Generation [153.92387500677023]
We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations.
The proposed graph Transformer encoder combines graph convolutions and self-attentions in a Transformer to model both local and global interactions.
We also propose a novel self-guided pre-training method for graph representation learning.
arXiv Detail & Related papers (2024-01-15T14:36:38Z) - Masked Contrastive Graph Representation Learning for Age Estimation [44.96502862249276]
This paper utilizes the property of graph representation learning in dealing with image redundancy information.
We propose a novel Masked Contrastive Graph Representation Learning (MCGRL) method for age estimation.
Experimental results on real-world face image datasets demonstrate the superiority of our proposed method over other state-of-the-art age estimation approaches.
arXiv Detail & Related papers (2023-06-16T15:53:21Z) - Graph Transformer GANs for Graph-Constrained House Generation [223.739067413952]
We present a novel graph Transformer generative adversarial network (GTGAN) to learn effective graph node relations.
The GTGAN learns effective graph node relations in an end-to-end fashion for the challenging graph-constrained house generation task.
arXiv Detail & Related papers (2023-03-14T20:35:45Z) - Symbolic image detection using scene and knowledge graphs [39.49756199669471]
We use a scene graph, a graph representation of an image, to capture visual components.
We generate a knowledge graph using facts extracted from ConceptNet to reason about objects and attributes.
We extend the network further to use an attention mechanism which learn the importance of the graph on representations.
arXiv Detail & Related papers (2022-06-10T04:06:28Z) - Spectral Graph Convolutional Networks With Lifting-based Adaptive Graph
Wavelets [81.63035727821145]
Spectral graph convolutional networks (SGCNs) have been attracting increasing attention in graph representation learning.
We propose a novel class of spectral graph convolutional networks that implement graph convolutions with adaptive graph wavelets.
arXiv Detail & Related papers (2021-08-03T17:57:53Z) - Group Contrastive Self-Supervised Learning on Graphs [101.45974132613293]
We study self-supervised learning on graphs using contrastive methods.
We argue that contrasting graphs in multiple subspaces enables graph encoders to capture more abundant characteristics.
arXiv Detail & Related papers (2021-07-20T22:09:21Z) - A Deep Local and Global Scene-Graph Matching for Image-Text Retrieval [4.159666152160874]
Scene graph presentation is a suitable method for the image-text matching challenge.
We introduce the Local and Global Scene Graph Matching (LGSGM) model that enhances the state-of-the-art method.
Our enhancement with the combination of levels can improve the performance of the baseline method by increasing the recall by more than 10% on the Flickr30k dataset.
arXiv Detail & Related papers (2021-06-04T10:33:14Z) - Multi-Level Graph Convolutional Network with Automatic Graph Learning
for Hyperspectral Image Classification [63.56018768401328]
We propose a Multi-level Graph Convolutional Network (GCN) with Automatic Graph Learning method (MGCN-AGL) for HSI classification.
By employing attention mechanism to characterize the importance among spatially neighboring regions, the most relevant information can be adaptively incorporated to make decisions.
Our MGCN-AGL encodes the long range dependencies among image regions based on the expressive representations that have been produced at local level.
arXiv Detail & Related papers (2020-09-19T09:26:20Z) - Bridging Knowledge Graphs to Generate Scene Graphs [49.69377653925448]
We propose a novel graph-based neural network that iteratively propagates information between the two graphs, as well as within each of them.
Our Graph Bridging Network, GB-Net, successively infers edges and nodes, allowing to simultaneously exploit and refine the rich, heterogeneous structure of the interconnected scene and commonsense graphs.
arXiv Detail & Related papers (2020-01-07T23:35:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.