Explanation-based Weakly-supervised Learning of Visual Relations with
Graph Networks
- URL: http://arxiv.org/abs/2006.09562v2
- Date: Fri, 17 Jul 2020 21:01:44 GMT
- Title: Explanation-based Weakly-supervised Learning of Visual Relations with
Graph Networks
- Authors: Federico Baldassarre, Kevin Smith, Josephine Sullivan, Hossein
Azizpour
- Abstract summary: This paper introduces a novel weakly-supervised method for visual relationship detection that relies on minimal image-level predicate labels.
A graph neural network is trained to classify predicates in images from a graph representation of detected objects, implicitly encoding an inductive bias for pairwise relations.
We present results comparable to recent fully- and weakly-supervised methods on three diverse and challenging datasets.
- Score: 7.199745314783952
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual relationship detection is fundamental for holistic image
understanding. However, the localization and classification of (subject,
predicate, object) triplets remain challenging tasks, due to the combinatorial
explosion of possible relationships, their long-tailed distribution in natural
images, and an expensive annotation process. This paper introduces a novel
weakly-supervised method for visual relationship detection that relies on
minimal image-level predicate labels. A graph neural network is trained to
classify predicates in images from a graph representation of detected objects,
implicitly encoding an inductive bias for pairwise relations. We then frame
relationship detection as the explanation of such a predicate classifier, i.e.
we obtain a complete relation by recovering the subject and object of a
predicted predicate. We present results comparable to recent fully- and
weakly-supervised methods on three diverse and challenging datasets: HICO-DET
for human-object interaction, Visual Relationship Detection for generic
object-to-object relations, and UnRel for unusual triplets; demonstrating
robustness to non-comprehensive annotations and good few-shot generalization.
Related papers
- Leveraging Predicate and Triplet Learning for Scene Graph Generation [31.09787444957997]
Scene Graph Generation (SGG) aims to identify entities and predict the relationship triplets.
We propose a Dual-granularity Relation Modeling (DRM) network to leverage fine-grained triplet cues besides the coarse-grained predicate ones.
Our method establishes new state-of-the-art performance on Visual Genome, Open Image, and GQA datasets.
arXiv Detail & Related papers (2024-06-04T07:23:41Z) - Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection [14.22646492640906]
We propose a simple and highly efficient decoder-free architecture for open-vocabulary visual relationship detection.
Our model consists of a Transformer-based image encoder that represents objects as tokens and models their relationships implicitly.
Our approach achieves state-of-the-art relationship detection performance on Visual Genome and on the large-vocabulary GQA benchmark at real-time inference speeds.
arXiv Detail & Related papers (2024-03-21T10:15:57Z) - Transitivity Recovering Decompositions: Interpretable and Robust
Fine-Grained Relationships [69.04014445666142]
Transitivity Recovering Decompositions (TRD) is a graph-space search algorithm that identifies interpretable equivalents of abstract emergent relationships.
We show that TRD is provably robust to noisy views, with empirical evidence also supporting this finding.
arXiv Detail & Related papers (2023-10-24T16:48:56Z) - Unified Graph Structured Models for Video Understanding [93.72081456202672]
We propose a message passing graph neural network that explicitly models relational-temporal relations.
We show how our method is able to more effectively model relationships between relevant entities in the scene.
arXiv Detail & Related papers (2021-03-29T14:37:35Z) - Relationship-based Neural Baby Talk [10.342180619706724]
We study three main relationships: textitspatial relationships to explore geometric interactions, textitsemantic relationships to extract semantic interactions, and textitimplicit relationships to capture hidden information.
Our proposed R-NBT model outperforms state-of-the-art models trained on COCO dataset in three image caption generation tasks.
arXiv Detail & Related papers (2021-03-08T15:51:24Z) - Tensor Composition Net for Visual Relationship Prediction [115.14829858763399]
We present a novel Composition Network (TCN) to predict visual relationships in images.
The key idea of our TCN is to exploit the low rank property of the visual relationship tensor.
We show our TCN's image-level visual relationship prediction provides a simple and efficient mechanism for relation-based image retrieval.
arXiv Detail & Related papers (2020-12-10T06:27:20Z) - Learning Relation Prototype from Unlabeled Texts for Long-tail Relation
Extraction [84.64435075778988]
We propose a general approach to learn relation prototypes from unlabeled texts.
We learn relation prototypes as an implicit factor between entities.
We conduct experiments on two publicly available datasets: New York Times and Google Distant Supervision.
arXiv Detail & Related papers (2020-11-27T06:21:12Z) - Dual ResGCN for Balanced Scene GraphGeneration [106.7828712878278]
We propose a novel model, dubbed textitdual ResGCN, which consists of an object residual graph convolutional network and a relation residual graph convolutional network.
The two networks are complementary to each other. The former captures object-level context information, textiti.e., the connections among objects.
The latter is carefully designed to explicitly capture relation-level context information textiti.e., the connections among relations.
arXiv Detail & Related papers (2020-11-09T07:44:17Z) - Addressing Class Imbalance in Scene Graph Parsing by Learning to
Contrast and Score [65.18522219013786]
Scene graph parsing aims to detect objects in an image scene and recognize their relations.
Recent approaches have achieved high average scores on some popular benchmarks, but fail in detecting rare relations.
This paper introduces a novel integrated framework of classification and ranking to resolve the class imbalance problem.
arXiv Detail & Related papers (2020-09-28T13:57:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.