Not All Relations are Equal: Mining Informative Labels for Scene Graph
Generation
- URL: http://arxiv.org/abs/2111.13517v1
- Date: Fri, 26 Nov 2021 14:34:12 GMT
- Title: Not All Relations are Equal: Mining Informative Labels for Scene Graph
Generation
- Authors: Arushi Goel, Basura Fernando, Frank Keller and Hakan Bilen
- Abstract summary: Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects.
Existing SGG methods fail to acquire complex reasoning about visual and textual correlations due to various biases in training data.
We propose a novel framework for SGG training that exploits relation labels based on their informativeness.
- Score: 48.21846438269506
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene graph generation (SGG) aims to capture a wide variety of interactions
between pairs of objects, which is essential for full scene understanding.
Existing SGG methods trained on the entire set of relations fail to acquire
complex reasoning about visual and textual correlations due to various biases
in training data. Learning on trivial relations that indicate generic spatial
configuration like 'on' instead of informative relations such as 'parked on'
does not enforce this complex reasoning, harming generalization. To address
this problem, we propose a novel framework for SGG training that exploits
relation labels based on their informativeness. Our model-agnostic training
procedure imputes missing informative relations for less informative samples in
the training data and trains a SGG model on the imputed labels along with
existing annotations. We show that this approach can successfully be used in
conjunction with state-of-the-art SGG methods and improves their performance
significantly in multiple metrics on the standard Visual Genome benchmark.
Furthermore, we obtain considerable improvements for unseen triplets in a more
challenging zero-shot setting.
Related papers
- Adaptive Visual Scene Understanding: Incremental Scene Graph Generation [18.541428517746034]
Scene graph generation (SGG) analyzes images to extract meaningful information about objects and their relationships.
We present a benchmark comprising three learning regimes: relationship incremental, scene incremental, and relationship generalization.
We also introduce a Replays via Analysis by Synthesis" method named RAS.
arXiv Detail & Related papers (2023-10-02T21:02:23Z) - Towards Unseen Triples: Effective Text-Image-joint Learning for Scene
Graph Generation [30.79358827005448]
Scene Graph Generation (SGG) aims to structurally and comprehensively represent objects and their connections in images.
Existing SGG models often struggle to solve the long-tailed problem caused by biased datasets.
We propose a Text-Image-joint Scene Graph Generation (TISGG) model to resolve the unseen triples and improve the generalisation capability of the SGG models.
arXiv Detail & Related papers (2023-06-23T10:17:56Z) - Towards Open-vocabulary Scene Graph Generation with Prompt-based
Finetuning [84.39787427288525]
Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image.
We introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes.
Our method can support inference over completely unseen object classes, which existing methods are incapable of handling.
arXiv Detail & Related papers (2022-08-17T09:05:38Z) - NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation [65.78472854070316]
We propose a novel NoIsy label CorrEction and Sample Training strategy for SGG: NICEST.
NICE first detects noisy samples and then reassigns them more high-quality soft predicate labels.
NICEST can be seamlessly incorporated into any SGG architecture to boost its performance on different predicate categories.
arXiv Detail & Related papers (2022-07-27T06:25:47Z) - Fine-Grained Scene Graph Generation with Data Transfer [127.17675443137064]
Scene graph generation (SGG) aims to extract (subject, predicate, object) triplets in images.
Recent works have made a steady progress on SGG, and provide useful tools for high-level vision and language understanding.
We propose a novel Internal and External Data Transfer (IETrans) method, which can be applied in a play-and-plug fashion and expanded to large SGG with 1,807 predicate classes.
arXiv Detail & Related papers (2022-03-22T12:26:56Z) - Hyper-relationship Learning Network for Scene Graph Generation [95.6796681398668]
We propose a hyper-relationship learning network, termed HLN, for scene graph generation.
We evaluate HLN on the most popular SGG dataset, i.e., the Visual Genome dataset.
For example, the proposed HLN improves the recall per relationship from 11.3% to 13.1%, and maintains the recall per image from 19.8% to 34.9%.
arXiv Detail & Related papers (2022-02-15T09:26:16Z) - Semantic Compositional Learning for Low-shot Scene Graph Generation [122.51930904132685]
Many scene graph generation (SGG) models solely use the limited annotated relation triples for training.
We propose a novel semantic compositional learning strategy that makes it possible to construct additional, realistic relation triples.
For three recent SGG models, adding our strategy improves their performance by close to 50%, and all of them substantially exceed the current state-of-the-art.
arXiv Detail & Related papers (2021-08-19T10:13:55Z) - Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge
Integration [9.203403318435486]
We propose CommOnsense-integrAted sCenegrapHrElation pRediction (COACHER), a framework to integrate commonsense knowledge for scene graph generation (SGG)
Specifically, we develop novel graph mining pipelines to model the neighborhoods and paths around entities in an external commonsense knowledge graph.
arXiv Detail & Related papers (2021-07-11T16:22:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.