RepSGG: Novel Representations of Entities and Relationships for Scene
Graph Generation
- URL: http://arxiv.org/abs/2309.03240v1
- Date: Wed, 6 Sep 2023 05:37:19 GMT
- Title: RepSGG: Novel Representations of Entities and Relationships for Scene
Graph Generation
- Authors: Hengyue Liu, Bir Bhanu
- Abstract summary: RepSGG is proposed to formulate a subject as queries, an object as keys, and their relationship as the maximum attention weight between pairwise queries and keys.
With more fine-grained and flexible representation power for entities and relationships, RepSGG learns to sample semantically discriminative and representative points for relationship inference.
RepSGG achieves the state-of-the-art or comparable performance on the Visual Genome and Open Images V6 datasets with fast inference speed.
- Score: 27.711809069547808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene Graph Generation (SGG) has achieved significant progress recently.
However, most previous works rely heavily on fixed-size entity representations
based on bounding box proposals, anchors, or learnable queries. As each
representation's cardinality has different trade-offs between performance and
computation overhead, extracting highly representative features efficiently and
dynamically is both challenging and crucial for SGG. In this work, a novel
architecture called RepSGG is proposed to address the aforementioned
challenges, formulating a subject as queries, an object as keys, and their
relationship as the maximum attention weight between pairwise queries and keys.
With more fine-grained and flexible representation power for entities and
relationships, RepSGG learns to sample semantically discriminative and
representative points for relationship inference. Moreover, the long-tailed
distribution also poses a significant challenge for generalization of SGG. A
run-time performance-guided logit adjustment (PGLA) strategy is proposed such
that the relationship logits are modified via affine transformations based on
run-time performance during training. This strategy encourages a more balanced
performance between dominant and rare classes. Experimental results show that
RepSGG achieves the state-of-the-art or comparable performance on the Visual
Genome and Open Images V6 datasets with fast inference speed, demonstrating the
efficacy and efficiency of the proposed methods.
Related papers
- Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency [3.351553095054309]
Scene graph generation (SGG) represents the relationships between objects in an image as a graph structure.
Previous studies have failed to reflect the co-occurrence of objects during SGG generation.
We propose CooK, which reflects the Co-occurrence Knowledge between objects, and the learnable term frequency-inverse document frequency.
arXiv Detail & Related papers (2024-05-21T09:56:48Z) - IDRNet: Intervention-Driven Relation Network for Semantic Segmentation [34.09179171102469]
Co-occurrent visual patterns suggest that pixel relation modeling facilitates dense prediction tasks.
Despite the impressive results, existing paradigms often suffer from inadequate or ineffective contextual information aggregation.
We propose a novel textbfIntervention-textbfDriven textbfRelation textbfNetwork.
arXiv Detail & Related papers (2023-10-16T18:37:33Z) - Network Alignment with Transferable Graph Autoencoders [79.89704126746204]
We propose a novel graph autoencoder architecture designed to extract powerful and robust node embeddings.
We prove that the generated embeddings are associated with the eigenvalues and eigenvectors of the graphs.
Our proposed framework also leverages transfer learning and data augmentation to achieve efficient network alignment at a very large scale without retraining.
arXiv Detail & Related papers (2023-10-05T02:58:29Z) - Mitigating Semantic Confusion from Hostile Neighborhood for Graph Active
Learning [38.5372139056485]
Graph Active Learning (GAL) aims to find the most informative nodes in graphs for annotation to maximize the Graph Neural Networks (GNNs) performance.
Gal strategies may introduce semantic confusion to the selected training set, particularly when graphs are noisy.
We present Semantic-aware Active learning framework for Graphs (SAG) to mitigate the semantic confusion problem.
arXiv Detail & Related papers (2023-08-17T07:06:54Z) - Prototype-based Embedding Network for Scene Graph Generation [105.97836135784794]
Current Scene Graph Generation (SGG) methods explore contextual information to predict relationships among entity pairs.
Due to the diverse visual appearance of numerous possible subject-object combinations, there is a large intra-class variation within each predicate category.
Prototype-based Embedding Network (PE-Net) models entities/predicates with prototype-aligned compact and distinctive representations.
PL is introduced to help PE-Net efficiently learn such entitypredicate matching, and Prototype Regularization (PR) is devised to relieve the ambiguous entity-predicate matching.
arXiv Detail & Related papers (2023-03-13T13:30:59Z) - Good Visual Guidance Makes A Better Extractor: Hierarchical Visual
Prefix for Multimodal Entity and Relation Extraction [88.6585431949086]
We propose a novel Hierarchical Visual Prefix fusion NeTwork (HVPNeT) for visual-enhanced entity and relation extraction.
We regard visual representation as pluggable visual prefix to guide the textual representation for error insensitive forecasting decision.
Experiments on three benchmark datasets demonstrate the effectiveness of our method, and achieve state-of-the-art performance.
arXiv Detail & Related papers (2022-05-07T02:10:55Z) - Relation Regularized Scene Graph Generation [206.76762860019065]
Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations.
We propose a relation regularized network (R2-Net) which can predict whether there is a relationship between two objects.
Our R2-Net can effectively refine object labels and generate scene graphs.
arXiv Detail & Related papers (2022-02-22T11:36:49Z) - SA-VQA: Structured Alignment of Visual and Semantic Representations for
Visual Question Answering [29.96818189046649]
We propose to apply structured alignments, which work with graph representation of visual and textual content.
As demonstrated in our experimental results, such a structured alignment improves reasoning performance.
The proposed model, without any pretraining, outperforms the state-of-the-art methods on GQA dataset, and beats the non-pretrained state-of-the-art methods on VQA-v2 dataset.
arXiv Detail & Related papers (2022-01-25T22:26:09Z) - Fully Convolutional Scene Graph Generation [30.194961716870186]
This paper presents a fully convolutional scene graph generation (FCSGG) model that detects objects and relations simultaneously.
FCSGG encodes objects as bounding box center points, and relationships as 2D vector fields which are named as Relation Affinity Fields (RAFs)
FCSGG achieves highly competitive results on recall and zero-shot recall with significantly reduced inference time.
arXiv Detail & Related papers (2021-03-30T05:25:38Z) - Jointly Cross- and Self-Modal Graph Attention Network for Query-Based
Moment Localization [77.21951145754065]
We propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph.
Our CSMGAN is able to effectively capture high-order interactions between two modalities, thus enabling a further precise localization.
arXiv Detail & Related papers (2020-08-04T08:25:24Z) - Self-Guided Adaptation: Progressive Representation Alignment for Domain
Adaptive Object Detection [86.69077525494106]
Unsupervised domain adaptation (UDA) has achieved unprecedented success in improving the cross-domain robustness of object detection models.
Existing UDA methods largely ignore the instantaneous data distribution during model learning, which could deteriorate the feature representation given large domain shift.
We propose a Self-Guided Adaptation (SGA) model, target at aligning feature representation and transferring object detection models across domains.
arXiv Detail & Related papers (2020-03-19T13:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.