Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation
- URL: http://arxiv.org/abs/2503.13947v2
- Date: Fri, 11 Apr 2025 03:03:26 GMT
- Title: Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation
- Authors: Sayak Nag, Udita Ghosh, Calvin-Khang Ta, Sarosij Bose, Jiachen Li, Amit K Roy Chowdhury,
- Abstract summary: Scene Graph Generation (SGG) aims to represent visual scenes by identifying objects and their pairwise relationships.<n>We introduce a novel Conformal Prediction (CP) based framework, adaptive to any existing SGG method, for quantifying their predictive uncertainty.<n>We show that our proposed approach can produce diverse possible scene graphs from an image, assess the reliability of SGG methods, and improve overall SGG performance.
- Score: 24.006445329554452
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene Graph Generation (SGG) aims to represent visual scenes by identifying objects and their pairwise relationships, providing a structured understanding of image content. However, inherent challenges like long-tailed class distributions and prediction variability necessitate uncertainty quantification in SGG for its practical viability. In this paper, we introduce a novel Conformal Prediction (CP) based framework, adaptive to any existing SGG method, for quantifying their predictive uncertainty by constructing well-calibrated prediction sets over their generated scene graphs. These scene graph prediction sets are designed to achieve statistically rigorous coverage guarantees. Additionally, to ensure these prediction sets contain the most practically interpretable scene graphs, we design an effective MLLM-based post-processing strategy for selecting the most visually and semantically plausible scene graphs within these prediction sets. We show that our proposed approach can produce diverse possible scene graphs from an image, assess the reliability of SGG methods, and improve overall SGG performance.
Related papers
- PRISM-0: A Predicate-Rich Scene Graph Generation Framework for Zero-Shot Open-Vocabulary Tasks [51.31903029903904]
In Scene Graphs Generation (SGG) one extracts structured representation from visual inputs in the form of objects nodes and predicates connecting them.
PRISM-0 is a framework for zero-shot open-vocabulary SGG that bootstraps foundation models in a bottom-up approach.
PRIMS-0 generates semantically meaningful graphs that improve downstream tasks such as Image Captioning and Sentence-to-Graph Retrieval.
arXiv Detail & Related papers (2025-04-01T14:29:51Z) - Graph Sparsification for Enhanced Conformal Prediction in Graph Neural Networks [5.896352342095999]
Conformal Prediction is a robust framework that ensures reliable coverage across machine learning tasks.
SparGCP incorporates graph sparsification and a conformal prediction-specific objective into GNN training.
Experiments on real-world graph datasets demonstrate that SparGCP outperforms existing methods.
arXiv Detail & Related papers (2024-10-28T23:53:51Z) - Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction [12.319354506916547]
We propose a novel Sample-Level Bias Prediction (SBP) method for fine-grained Scene Graph Generation (SGG)
Firstly, we train a classic SGG model and construct a correction bias set.
Then, we devise a Bias-Oriented Generative Adversarial Network (BGAN) that learns to predict the constructed correction biases.
arXiv Detail & Related papers (2024-07-27T13:49:06Z) - Improving the interpretability of GNN predictions through conformal-based graph sparsification [9.550589670316523]
Graph Neural Networks (GNNs) have achieved state-of-the-art performance in solving graph classification tasks.
We propose a GNN emphtraining approach that finds the most predictive subgraph by removing edges and/or nodes.
We rely on reinforcement learning to solve the resulting bi-level optimization with a reward function based on conformal predictions.
arXiv Detail & Related papers (2024-04-18T17:34:47Z) - Predicate Debiasing in Vision-Language Models Integration for Scene Graph Generation Enhancement [6.8754535229258975]
Scene Graph Generation (SGG) provides basic language representation of visual scenes.<n>Part of triplet labels are rare or even unseen during training, resulting in imprecise predictions.<n>We propose integrating pretrained Vision-language Models to enhance representation.
arXiv Detail & Related papers (2024-03-24T15:02:24Z) - Uncertainty Quantification over Graph with Conformalized Graph Neural
Networks [52.20904874696597]
Graph Neural Networks (GNNs) are powerful machine learning prediction models on graph-structured data.
GNNs lack rigorous uncertainty estimates, limiting their reliable deployment in settings where the cost of errors is significant.
We propose conformalized GNN (CF-GNN), extending conformal prediction (CP) to graph-based models for guaranteed uncertainty estimates.
arXiv Detail & Related papers (2023-05-23T21:38:23Z) - Self-Supervised Relation Alignment for Scene Graph Generation [44.3983804479146]
We introduce a self-supervised relational alignment regularization to improve scene graph generation performance.
The proposed alignment is general and can be combined with any existing scene graph generation framework.
We illustrate the effectiveness of this self-supervised relational alignment in conjunction with two scene graph generation architectures.
arXiv Detail & Related papers (2023-02-02T20:34:13Z) - Towards Open-vocabulary Scene Graph Generation with Prompt-based
Finetuning [84.39787427288525]
Scene graph generation (SGG) is a fundamental task aimed at detecting visual relations between objects in an image.
We introduce open-vocabulary scene graph generation, a novel, realistic and challenging setting in which a model is trained on a set of base object classes.
Our method can support inference over completely unseen object classes, which existing methods are incapable of handling.
arXiv Detail & Related papers (2022-08-17T09:05:38Z) - Iterative Scene Graph Generation [55.893695946885174]
Scene graph generation involves identifying object entities and their corresponding interaction predicates in a given image (or video)
Existing approaches to scene graph generation assume certain factorization of the joint distribution to make the estimation iteration feasible.
We propose a novel framework that addresses this limitation, as well as introduces dynamic conditioning on the image.
arXiv Detail & Related papers (2022-07-27T10:37:29Z) - Adaptive Fine-Grained Predicates Learning for Scene Graph Generation [122.4588401267544]
General Scene Graph Generation (SGG) models tend to predict head predicates and re-balancing strategies prefer tail categories.
We propose an Adaptive Fine-Grained Predicates Learning (FGPL-A) which aims at differentiating hard-to-distinguish predicates for SGG.
Our proposed model-agnostic strategy significantly boosts performance of benchmark models on VG-SGG and GQA-SGG datasets by up to 175% and 76% on Mean Recall@100, achieving new state-of-the-art performance.
arXiv Detail & Related papers (2022-07-11T03:37:57Z) - Bayesian Graph Contrastive Learning [55.36652660268726]
We propose a novel perspective of graph contrastive learning methods showing random augmentations leads to encoders.
Our proposed method represents each node by a distribution in the latent space in contrast to existing techniques which embed each node to a deterministic vector.
We show a considerable improvement in performance compared to existing state-of-the-art methods on several benchmark datasets.
arXiv Detail & Related papers (2021-12-15T01:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.