Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge
Integration
- URL: http://arxiv.org/abs/2107.05080v1
- Date: Sun, 11 Jul 2021 16:22:45 GMT
- Title: Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge
Integration
- Authors: Xuan Kan, Hejie Cui, Carl Yang
- Abstract summary: We propose CommOnsense-integrAted sCenegrapHrElation pRediction (COACHER), a framework to integrate commonsense knowledge for scene graph generation (SGG)
Specifically, we develop novel graph mining pipelines to model the neighborhoods and paths around entities in an external commonsense knowledge graph.
- Score: 9.203403318435486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Relation prediction among entities in images is an important step in scene
graph generation (SGG), which further impacts various visual understanding and
reasoning tasks. Existing SGG frameworks, however, require heavy training yet
are incapable of modeling unseen (i.e.,zero-shot) triplets. In this work, we
stress that such incapability is due to the lack of commonsense reasoning,i.e.,
the ability to associate similar entities and infer similar relations based on
general understanding of the world. To fill this gap, we propose
CommOnsense-integrAted sCenegrapHrElation pRediction (COACHER), a framework to
integrate commonsense knowledge for SGG, especially for zero-shot relation
prediction. Specifically, we develop novel graph mining pipelines to model the
neighborhoods and paths around entities in an external commonsense knowledge
graph, and integrate them on top of state-of-the-art SGG frameworks. Extensive
quantitative evaluations and qualitative case studies on both original and
manipulated datasets from Visual Genome demonstrate the effectiveness of our
proposed approach.
Related papers
- Towards Lifelong Scene Graph Generation with Knowledge-ware In-context
Prompt Learning [24.98058940030532]
Scene graph generation (SGG) endeavors to predict visual relationships between pairs of objects within an image.
This work seeks to address the pitfall inherent in a suite of prior relationship predictions.
Motivated by the achievements of in-context learning in pretrained language models, our approach imbues the model with the capability to predict relationships.
arXiv Detail & Related papers (2024-01-26T03:43:22Z) - Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention [69.36723767339001]
Scene Graph Generation (SGG) offers a structured representation critical in many computer vision applications.
We propose a unified framework named OvSGTR towards fully open vocabulary SGG from a holistic view.
For the more challenging settings of relation-involved open vocabulary SGG, the proposed approach integrates relation-aware pretraining.
arXiv Detail & Related papers (2023-11-18T06:49:17Z) - Adaptive Visual Scene Understanding: Incremental Scene Graph Generation [18.541428517746034]
Scene graph generation (SGG) analyzes images to extract meaningful information about objects and their relationships.
We present a benchmark comprising three learning regimes: relationship incremental, scene incremental, and relationship generalization.
We also introduce a Replays via Analysis by Synthesis" method named RAS.
arXiv Detail & Related papers (2023-10-02T21:02:23Z) - CADGE: Context-Aware Dialogue Generation Enhanced with Graph-Structured Knowledge Aggregation [25.56539617837482]
A novel context-aware graph-attention model (Context-aware GAT) is proposed.
It assimilates global features from relevant knowledge graphs through a context-enhanced knowledge aggregation mechanism.
Empirical results demonstrate that our framework outperforms conventional GNN-based language models in terms of performance.
arXiv Detail & Related papers (2023-05-10T16:31:35Z) - Hyper-relationship Learning Network for Scene Graph Generation [95.6796681398668]
We propose a hyper-relationship learning network, termed HLN, for scene graph generation.
We evaluate HLN on the most popular SGG dataset, i.e., the Visual Genome dataset.
For example, the proposed HLN improves the recall per relationship from 11.3% to 13.1%, and maintains the recall per image from 19.8% to 34.9%.
arXiv Detail & Related papers (2022-02-15T09:26:16Z) - Knowledge Graph Augmented Network Towards Multiview Representation
Learning for Aspect-based Sentiment Analysis [96.53859361560505]
We propose a knowledge graph augmented network (KGAN) to incorporate external knowledge with explicitly syntactic and contextual information.
KGAN captures the sentiment feature representations from multiple perspectives, i.e., context-, syntax- and knowledge-based.
Experiments on three popular ABSA benchmarks demonstrate the effectiveness and robustness of our KGAN.
arXiv Detail & Related papers (2022-01-13T08:25:53Z) - Not All Relations are Equal: Mining Informative Labels for Scene Graph
Generation [48.21846438269506]
Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects.
Existing SGG methods fail to acquire complex reasoning about visual and textual correlations due to various biases in training data.
We propose a novel framework for SGG training that exploits relation labels based on their informativeness.
arXiv Detail & Related papers (2021-11-26T14:34:12Z) - Tackling the Challenges in Scene Graph Generation with Local-to-Global
Interactions [4.726777092009554]
We seek new insights into the underlying challenges of the Scene Graph Generation (SGG) task.
Motivated by the analysis, we design a novel SGG framework, Local-to-Global Interaction Networks (LOGIN)
Our framework enables predicting the scene graph in a local-to-global manner by design, leveraging the possible complementariness.
arXiv Detail & Related papers (2021-06-16T03:58:21Z) - Tensor Graph Convolutional Networks for Multi-relational and Robust
Learning [74.05478502080658]
This paper introduces a tensor-graph convolutional network (TGCN) for scalable semi-supervised learning (SSL) from data associated with a collection of graphs, that are represented by a tensor.
The proposed architecture achieves markedly improved performance relative to standard GCNs, copes with state-of-the-art adversarial attacks, and leads to remarkable SSL performance over protein-to-protein interaction networks.
arXiv Detail & Related papers (2020-03-15T02:33:21Z) - Graph Representation Learning via Graphical Mutual Information
Maximization [86.32278001019854]
We propose a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graphs and high-level hidden representations.
We develop an unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder.
arXiv Detail & Related papers (2020-02-04T08:33:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.