Related papers: HOSE-Net: Higher Order Structure Embedded Network for Scene Graph Generation

HOSE-Net: Higher Order Structure Embedded Network for Scene Graph Generation

URL: http://arxiv.org/abs/2008.05156v1
Date: Wed, 12 Aug 2020 07:58:13 GMT
Title: HOSE-Net: Higher Order Structure Embedded Network for Scene Graph Generation
Authors: Meng Wei, Chun Yuan, Xiaoyu Yue, Kuo Zhong
Abstract summary: This paper presents a novel structure-aware embedding-to-classifier(SEC) module to incorporate both local and global structural information of relationships into the output space. We also propose a hierarchical semantic aggregation(HSA) module to reduce the number of subspaces by introducing higher order structural information. The proposed HOSE-Net achieves the state-of-the-art performance on two popular benchmarks of Visual Genome and VRD.
Score: 20.148175528691905
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Scene graph generation aims to produce structured representations for images, which requires to understand the relations between objects. Due to the continuous nature of deep neural networks, the prediction of scene graphs is divided into object detection and relation classification. However, the independent relation classes cannot separate the visual features well. Although some methods organize the visual features into graph structures and use message passing to learn contextual information, they still suffer from drastic intra-class variations and unbalanced data distributions. One important factor is that they learn an unstructured output space that ignores the inherent structures of scene graphs. Accordingly, in this paper, we propose a Higher Order Structure Embedded Network (HOSE-Net) to mitigate this issue. First, we propose a novel structure-aware embedding-to-classifier(SEC) module to incorporate both local and global structural information of relationships into the output space. Specifically, a set of context embeddings are learned via local graph based message passing and then mapped to a global structure based classification space. Second, since learning too many context-specific classification subspaces can suffer from data sparsity issues, we propose a hierarchical semantic aggregation(HSA) module to reduces the number of subspaces by introducing higher order structural information. HSA is also a fast and flexible tool to automatically search a semantic object hierarchy based on relational knowledge graphs. Extensive experiments show that the proposed HOSE-Net achieves the state-of-the-art performance on two popular benchmarks of Visual Genome and VRD.

Related papers

Learning to Model Graph Structural Information on MLPs via Graph Structure Self-Contrasting [50.181824673039436]
We propose a Graph Structure Self-Contrasting (GSSC) framework that learns graph structural information without message passing. The proposed framework is based purely on Multi-Layer Perceptrons (MLPs), where the structural information is only implicitly incorporated as prior knowledge. It first applies structural sparsification to remove potentially uninformative or noisy edges in the neighborhood, and then performs structural self-contrasting in the sparsified neighborhood to learn robust node representations.
arXiv Detail & Related papers (2024-09-09T12:56:02Z)
Individual and Structural Graph Information Bottlenecks for Out-of-Distribution Generalization [21.227825123510293]
We propose Individual Graph Information Bottleneck (I-GIB) and Structural Graph Information Bottleneck (S-GIB) I-GIB discards irrelevant information by minimizing the mutual information between the input graph and its embeddings. S-GIB simultaneously discards spurious features and learn invariant features from a high-order perspective.
arXiv Detail & Related papers (2023-06-28T03:52:41Z)
Task-specific Scene Structure Representations [13.775485887433815]
We propose a single general neural network architecture for extracting task-specific structure guidance for scenes. Our main contribution is to show that such a simple network can achieve state-of-the-art results for several low-level vision applications.
arXiv Detail & Related papers (2023-01-02T08:25:47Z)
Structure-Preserving Graph Representation Learning [43.43429108503634]
We propose a novel Structure-Preserving Graph Representation Learning (SPGRL) method to fully capture the structure information of graphs. Specifically, to reduce the uncertainty and misinformation of the original graph, we construct a feature graph as a complementary view via k-Nearest Neighbor method. Our method has quite superior performance on semi-supervised node classification task and excellent robustness under noise perturbation on graph structure or node features.
arXiv Detail & Related papers (2022-09-02T02:49:19Z)
BSAL: A Framework of Bi-component Structure and Attribute Learning for Link Prediction [33.488229191263564]
We propose a bicomponent structural and attribute learning framework (BSAL) that is designed to adaptively leverage information from topology and feature spaces. BSAL constructs a semantic topology via the node attributes and then gets the embeddings regarding the semantic view. It provides a flexible and easy-to-implement solution to adaptively incorporate the information carried by the node attributes.
arXiv Detail & Related papers (2022-04-18T03:12:13Z)
Deep Hierarchical Semantic Segmentation [76.40565872257709]
hierarchical semantic segmentation (HSS) aims at structured, pixel-wise description of visual observation in terms of a class hierarchy. HSSN casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models. With hierarchy-induced margin constraints, HSSN reshapes the pixel embedding space, so as to generate well-structured pixel representations.
arXiv Detail & Related papers (2022-03-27T15:47:44Z)
Self-supervised Graph-level Representation Learning with Local and Global Structure [71.45196938842608]
We propose a unified framework called Local-instance and Global-semantic Learning (GraphLoG) for self-supervised whole-graph representation learning. Besides preserving the local similarities, GraphLoG introduces the hierarchical prototypes to capture the global semantic clusters. An efficient online expectation-maximization (EM) algorithm is further developed for learning the model.
arXiv Detail & Related papers (2021-06-08T05:25:38Z)
Graph Information Bottleneck [77.21967740646784]
Graph Neural Networks (GNNs) provide an expressive way to fuse information from network structure and node features. Inheriting from the general Information Bottleneck (IB), GIB aims to learn the minimal sufficient representation for a given task. We show that our proposed models are more robust than state-of-the-art graph defense models.
arXiv Detail & Related papers (2020-10-24T07:13:00Z)
Multi-Level Graph Convolutional Network with Automatic Graph Learning for Hyperspectral Image Classification [63.56018768401328]
We propose a Multi-level Graph Convolutional Network (GCN) with Automatic Graph Learning method (MGCN-AGL) for HSI classification. By employing attention mechanism to characterize the importance among spatially neighboring regions, the most relevant information can be adaptively incorporated to make decisions. Our MGCN-AGL encodes the long range dependencies among image regions based on the expressive representations that have been produced at local level.
arXiv Detail & Related papers (2020-09-19T09:26:20Z)
High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment. Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.