OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
- URL: http://arxiv.org/abs/2407.11213v1
- Date: Mon, 15 Jul 2024 19:56:42 GMT
- Title: OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
- Authors: Zijian Zhou, Zheng Zhu, Holger Caesar, Miaojing Shi,
- Abstract summary: Panoptic Scene Graph Generation (PSG) aims to segment objects and recognize their relations, enabling the structured understanding of an image.
Previous methods focus on predicting predefined object and relation categories, hence limiting their applications in the open world scenarios.
In this paper, we focus on the task of open-set relation prediction integrated with a pretrained open-set panoptic segmentation model.
- Score: 28.742671870397757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Panoptic Scene Graph Generation (PSG) aims to segment objects and recognize their relations, enabling the structured understanding of an image. Previous methods focus on predicting predefined object and relation categories, hence limiting their applications in the open world scenarios. With the rapid development of large multimodal models (LMMs), significant progress has been made in open-set object detection and segmentation, yet open-set relation prediction in PSG remains unexplored. In this paper, we focus on the task of open-set relation prediction integrated with a pretrained open-set panoptic segmentation model to achieve true open-set panoptic scene graph generation (OpenPSG). Our OpenPSG leverages LMMs to achieve open-set relation prediction in an autoregressive manner. We introduce a relation query transformer to efficiently extract visual features of object pairs and estimate the existence of relations between them. The latter can enhance the prediction efficiency by filtering irrelevant pairs. Finally, we design the generation and judgement instructions to perform open-set relation prediction in PSG autoregressively. To our knowledge, we are the first to propose the open-set PSG task. Extensive experiments demonstrate that our method achieves state-of-the-art performance in open-set relation prediction and panoptic scene graph generation. Code is available at \url{https://github.com/franciszzj/OpenPSG}.
Related papers
- Enhancing Scene Graph Generation with Hierarchical Relationships and Commonsense Knowledge [7.28830964611216]
This work introduces an enhanced approach to generating scene graphs by both a relationship hierarchy and commonsense knowledge.
We implement a robust commonsense validation pipeline that harnesses foundation models to critique the results from the scene graph prediction system.
Experiments on Visual Genome and OpenImage V6 datasets demonstrate that the proposed modules can be seamlessly integrated as plug-and-play enhancements to existing scene graph generation algorithms.
arXiv Detail & Related papers (2023-11-21T06:03:20Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - Relational Prior Knowledge Graphs for Detection and Instance
Segmentation [24.360473253478112]
We propose a graph that enhances object features using priors.
Experimental evaluations on COCO show that the utilization of scene graphs, augmented with relational priors, offer benefits for object detection and instance segmentation.
arXiv Detail & Related papers (2023-10-11T15:15:05Z) - Pair then Relation: Pair-Net for Panoptic Scene Graph Generation [54.92476119356985]
Panoptic Scene Graph (PSG) aims to create a more comprehensive scene graph representation using panoptic segmentation instead of boxes.
Current PSG methods have limited performance, which hinders downstream tasks or applications.
We present a novel framework: Pair then Relation (Pair-Net), which uses a Pair Proposal Network (PPN) to learn and filter sparse pair-wise relationships between subjects and objects.
arXiv Detail & Related papers (2023-07-17T17:58:37Z) - 1st Place Solution for PSG competition with ECCV'22 SenseHuman Workshop [1.5362025549031049]
Panoptic Scene Graph (PSG) generation aims to generate scene graph representations based on panoptic segmentation instead of rigid bounding boxes.
We propose GRNet, a Global Relation Network in two-stage paradigm, where the pre-extracted local object features and their corresponding masks are fed into a transformer with class embeddings.
We conduct comprehensive experiments on OpenPSG dataset and achieve the state-of-art performance on the leadboard.
arXiv Detail & Related papers (2023-02-06T09:47:46Z) - Scene Graph Modification as Incremental Structure Expanding [61.84291817776118]
We focus on scene graph modification (SGM), where the system is required to learn how to update an existing scene graph based on a natural language query.
We frame SGM as a graph expansion task by introducing the incremental structure expanding (ISE)
We construct a challenging dataset that contains more complicated queries and larger scene graphs than existing datasets.
arXiv Detail & Related papers (2022-09-15T16:26:14Z) - Ordinal Graph Gamma Belief Network for Social Recommender Systems [54.9487910312535]
We develop a hierarchical Bayesian model termed ordinal graph factor analysis (OGFA), which jointly models user-item and user-user interactions.
OGFA not only achieves good recommendation performance, but also extracts interpretable latent factors corresponding to representative user preferences.
We extend OGFA to ordinal graph gamma belief network, which is a multi-stochastic-layer deep probabilistic model.
arXiv Detail & Related papers (2022-09-12T09:19:22Z) - Panoptic Scene Graph Generation [41.534209967051645]
panoptic scene graph generation (PSG) is a new problem task that requires the model to generate a more comprehensive scene graph representation.
A high-quality PSG dataset contains 49k well-annotated overlapping images from COCO and Visual Genome.
arXiv Detail & Related papers (2022-07-22T17:59:53Z) - Compact Graph Structure Learning via Mutual Information Compression [79.225671302689]
Graph Structure Learning (GSL) has attracted considerable attentions in its capacity of optimizing graph structure and learning parameters of Graph Neural Networks (GNNs)
We propose a Compact GSL architecture by MI compression, named CoGSL.
We conduct extensive experiments on several datasets under clean and attacked conditions, which demonstrate the effectiveness and robustness of CoGSL.
arXiv Detail & Related papers (2022-01-14T16:22:33Z) - Jointly Cross- and Self-Modal Graph Attention Network for Query-Based
Moment Localization [77.21951145754065]
We propose a novel Cross- and Self-Modal Graph Attention Network (CSMGAN) that recasts this task as a process of iterative messages passing over a joint graph.
Our CSMGAN is able to effectively capture high-order interactions between two modalities, thus enabling a further precise localization.
arXiv Detail & Related papers (2020-08-04T08:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.