Panoptic Scene Graph Generation
- URL: http://arxiv.org/abs/2207.11247v1
- Date: Fri, 22 Jul 2022 17:59:53 GMT
- Title: Panoptic Scene Graph Generation
- Authors: Jingkang Yang, Yi Zhe Ang, Zujin Guo, Kaiyang Zhou, Wayne Zhang, and
Ziwei Liu
- Abstract summary: panoptic scene graph generation (PSG) is a new problem task that requires the model to generate a more comprehensive scene graph representation.
A high-quality PSG dataset contains 49k well-annotated overlapping images from COCO and Visual Genome.
- Score: 41.534209967051645
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing research addresses scene graph generation (SGG) -- a critical
technology for scene understanding in images -- from a detection perspective,
i.e., objects are detected using bounding boxes followed by prediction of their
pairwise relationships. We argue that such a paradigm causes several problems
that impede the progress of the field. For instance, bounding box-based labels
in current datasets usually contain redundant classes like hairs, and leave out
background information that is crucial to the understanding of context. In this
work, we introduce panoptic scene graph generation (PSG), a new problem task
that requires the model to generate a more comprehensive scene graph
representation based on panoptic segmentations rather than rigid bounding
boxes. A high-quality PSG dataset, which contains 49k well-annotated
overlapping images from COCO and Visual Genome, is created for the community to
keep track of its progress. For benchmarking, we build four two-stage
baselines, which are modified from classic methods in SGG, and two one-stage
baselines called PSGTR and PSGFormer, which are based on the efficient
Transformer-based detector, i.e., DETR. While PSGTR uses a set of queries to
directly learn triplets, PSGFormer separately models the objects and relations
in the form of queries from two Transformer decoders, followed by a
prompting-like relation-object matching mechanism. In the end, we share
insights on open challenges and future directions.
Related papers
- DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation [13.058196732927135]
Scene graph generation aims to capture detailed spatial and semantic relationships between objects in an image.
Existing Transformer-based methods either employ distinct queries for objects and predicates or utilize holistic queries for relation triplets.
We present a new Transformer-based method, called DSGG, that views scene graph detection as a direct graph prediction problem.
arXiv Detail & Related papers (2024-03-21T23:43:30Z) - TextPSG: Panoptic Scene Graph Generation from Textual Descriptions [78.1140391134517]
We study a new problem of Panoptic Scene Graph Generation from Purely Textual Descriptions (Caption-to-PSG)
The key idea is to leverage the large collection of free image-caption data on the Web alone to generate panoptic scene graphs.
We propose a new framework TextPSG consisting of four modules, i.e., a region grouper, an entity grounder, a segment merger, and a label generator.
arXiv Detail & Related papers (2023-10-10T22:36:15Z) - Pair then Relation: Pair-Net for Panoptic Scene Graph Generation [54.92476119356985]
Panoptic Scene Graph (PSG) aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes.
Current PSG methods have limited performance, which hinders downstream tasks or applications.
We present a novel framework: Pair then Relation (Pair-Net), which uses a Pair Proposal Network (PPN) to learn and filter sparse pair-wise relationships between subjects and objects.
arXiv Detail & Related papers (2023-07-17T17:58:37Z) - 1st Place Solution for PSG competition with ECCV'22 SenseHuman Workshop [1.5362025549031049]
Panoptic Scene Graph (PSG) generation aims to generate scene graph representations based on panoptic segmentation instead of rigid bounding boxes.
We propose GRNet, a Global Relation Network in two-stage paradigm, where the pre-extracted local object features and their corresponding masks are fed into a transformer with class embeddings.
We conduct comprehensive experiments on OpenPSG dataset and achieve the state-of-art performance on the leadboard.
arXiv Detail & Related papers (2023-02-06T09:47:46Z) - Image Semantic Relation Generation [0.76146285961466]
Scene graphs can distil complex image information and correct the bias of visual models using semantic-level relations.
In this work, we introduce image semantic relation generation (ISRG), a simple but effective image-to-text model.
arXiv Detail & Related papers (2022-10-19T16:15:19Z) - Graph Reasoning Transformer for Image Parsing [67.76633142645284]
We propose a novel Graph Reasoning Transformer (GReaT) for image parsing to enable image patches to interact following a relation reasoning pattern.
Compared to the conventional transformer, GReaT has higher interaction efficiency and a more purposeful interaction pattern.
Results show that GReaT achieves consistent performance gains with slight computational overheads on the state-of-the-art transformer baselines.
arXiv Detail & Related papers (2022-09-20T08:21:37Z) - Scene Graph Modification as Incremental Structure Expanding [61.84291817776118]
We focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query.
We frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE)
We construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets.
arXiv Detail & Related papers (2022-09-15T16:26:14Z) - Segmentation-grounded Scene Graph Generation [47.34166260639392]
We propose a framework for pixel-level segmentation-grounded scene graph generation.
Our framework is agnostic to the underlying scene graph generation method.
It is learned in a multi-task manner with both target and auxiliary datasets.
arXiv Detail & Related papers (2021-04-29T08:54:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.