HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic
Scene Graph Generation
- URL: http://arxiv.org/abs/2303.15994v2
- Date: Thu, 17 Aug 2023 12:31:40 GMT
- Title: HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic
Scene Graph Generation
- Authors: Zijian Zhou, Miaojing Shi, Holger Caesar
- Abstract summary: Panoptic Scene Graph generation (PSG) aims to segment the image and extract triplets of subjects, objects and their relations to build a scene graph.
This task suffers from a long-tail problem in its relation categories, making naive biased methods more inclined to high-frequency relations.
Existing unbiased methods tackle the long-tail problem by data/loss rebalancing to favor low-frequency relations.
While existing methods favor one over the other, our proposed HiLo framework lets different network branches specialize on low and high frequency relations.
- Score: 13.221163846643607
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Panoptic Scene Graph generation (PSG) is a recently proposed task in image
scene understanding that aims to segment the image and extract triplets of
subjects, objects and their relations to build a scene graph. This task is
particularly challenging for two reasons. First, it suffers from a long-tail
problem in its relation categories, making naive biased methods more inclined
to high-frequency relations. Existing unbiased methods tackle the long-tail
problem by data/loss rebalancing to favor low-frequency relations. Second, a
subject-object pair can have two or more semantically overlapping relations.
While existing methods favor one over the other, our proposed HiLo framework
lets different network branches specialize on low and high frequency relations,
enforce their consistency and fuse the results. To the best of our knowledge we
are the first to propose an explicitly unbiased PSG method. In extensive
experiments we show that our HiLo framework achieves state-of-the-art results
on the PSG task. We also apply our method to the Scene Graph Generation task
that predicts boxes instead of masks and see improvements over all baseline
methods. Code is available at https://github.com/franciszzj/HiLo.
Related papers
- Pair then Relation: Pair-Net for Panoptic Scene Graph Generation [54.92476119356985]
Panoptic Scene Graph (PSG) aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes.
Current PSG methods have limited performance, which hinders downstream tasks or applications.
We present a novel framework: Pair then Relation (Pair-Net), which uses a Pair Proposal Network (PPN) to learn and filter sparse pair-wise relationships between subjects and objects.
arXiv Detail & Related papers (2023-07-17T17:58:37Z) - Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph
Generation [55.429541407920304]
Recognizing the predicate between subject and object pairs is imbalanced and multi-label in nature.
Recent state-of-the-art methods predominantly focus on the most frequently occurring predicate classes.
We introduce a multi-label meta-learning framework to deal with the biased predicate distribution.
arXiv Detail & Related papers (2023-06-16T18:14:23Z) - Fine-Grained is Too Coarse: A Novel Data-Centric Approach for Efficient
Scene Graph Generation [0.7851536646859476]
We introduce the task of Efficient Scene Graph Generation (SGG) that prioritizes the generation of relevant relations.
We present a new dataset, VG150-curated, based on the annotations of the popular Visual Genome dataset.
We show through a set of experiments that this dataset contains more high-quality and diverse annotations than the one usually use in SGG.
arXiv Detail & Related papers (2023-05-30T00:55:49Z) - Learnable Graph Matching: A Practical Paradigm for Data Association [74.28753343714858]
We propose a general learnable graph matching method to address these issues.
Our method achieves state-of-the-art performance on several MOT datasets.
For image matching, our method outperforms state-of-the-art methods on a popular indoor dataset, ScanNet.
arXiv Detail & Related papers (2023-03-27T17:39:00Z) - Location-Free Scene Graph Generation [45.366540803729386]
Scene Graph Generation (SGG) is a visual understanding task, aiming to describe a scene as a graph of entities and their relationships with each other.
Existing works rely on location labels in form of bounding boxes or segmentation masks, increasing annotation costs and limiting dataset expansion.
We break this dependency and introduce location-free scene graph generation (LF-SGG)
This new task aims at predicting instances of entities, as well as their relationships, without the explicit calculation of their spatial localization.
arXiv Detail & Related papers (2023-03-20T08:57:45Z) - Diffusion-Based Scene Graph to Image Generation with Masked Contrastive
Pre-Training [112.94542676251133]
We propose to learn scene graph embeddings by directly optimizing their alignment with images.
Specifically, we pre-train an encoder to extract both global and local information from scene graphs.
The resulting method, called SGDiff, allows for the semantic manipulation of generated images by modifying scene graph nodes and connections.
arXiv Detail & Related papers (2022-11-21T01:11:19Z) - Image Semantic Relation Generation [0.76146285961466]
Scene graphs can distil complex image information and correct the bias of visual models using semantic-level relations.
In this work, we introduce image semantic relation generation (ISRG), a simple but effective image-to-text model.
arXiv Detail & Related papers (2022-10-19T16:15:19Z) - Panoptic Scene Graph Generation [41.534209967051645]
panoptic scene graph generation (PSG) is a new problem task that requires the model to generate a more comprehensive scene graph representation.
A high-quality PSG dataset contains 49k well-annotated overlapping images from COCO and Visual Genome.
arXiv Detail & Related papers (2022-07-22T17:59:53Z) - Dual ResGCN for Balanced Scene GraphGeneration [106.7828712878278]
We propose a novel model, dubbed textitdual ResGCN, which consists of an object residual graph convolutional network and a relation residual graph convolutional network.
The two networks are complementary to each other. The former captures object-level context information, textiti.e., the connections among objects.
The latter is carefully designed to explicitly capture relation-level context information textiti.e., the connections among relations.
arXiv Detail & Related papers (2020-11-09T07:44:17Z) - Bipartite Graph Reasoning GANs for Person Image Generation [159.00654368677513]
We present a novel Bipartite Graph Reasoning GAN (BiGraphGAN) for the challenging person image generation task.
The proposed graph generator mainly consists of two novel blocks that aim to model the pose-to-pose and pose-to-image relations.
arXiv Detail & Related papers (2020-08-10T19:37:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.