Rethinking the Evaluation of Unbiased Scene Graph Generation
- URL: http://arxiv.org/abs/2208.01909v1
- Date: Wed, 3 Aug 2022 08:23:51 GMT
- Title: Rethinking the Evaluation of Unbiased Scene Graph Generation
- Authors: Xingchen Li, Long Chen, Jian Shao, Shaoning Xiao, Songyang Zhang and
Jun Xiao
- Abstract summary: Scene Graph Generation (SGG) methods tend to predict frequent predicate categories and fail to recognize rare ones.
Recent research has focused on unbiased SGG and adopted mean Recall@K as the main evaluation metric.
We propose two complementary evaluation metrics for unbiased SGG: Independent Mean Recall (IMR) and weighted IMR (wIMR)
- Score: 31.041074897404236
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since the severe imbalanced predicate distributions in common subject-object
relations, current Scene Graph Generation (SGG) methods tend to predict
frequent predicate categories and fail to recognize rare ones. To improve the
robustness of SGG models on different predicate categories, recent research has
focused on unbiased SGG and adopted mean Recall@K (mR@K) as the main evaluation
metric. However, we discovered two overlooked issues about this de facto
standard metric mR@K, which makes current unbiased SGG evaluation vulnerable
and unfair: 1) mR@K neglects the correlations among predicates and
unintentionally breaks category independence when ranking all the triplet
predictions together regardless of the predicate categories, leading to the
performance of some predicates being underestimated. 2) mR@K neglects the
compositional diversity of different predicates and assigns excessively high
weights to some oversimple category samples with limited composable relation
triplet types. It totally conflicts with the goal of SGG task which encourages
models to detect more types of visual relationship triplets. In addition, we
investigate the under-explored correlation between objects and predicates,
which can serve as a simple but strong baseline for unbiased SGG. In this
paper, we refine mR@K and propose two complementary evaluation metrics for
unbiased SGG: Independent Mean Recall (IMR) and weighted IMR (wIMR). These two
metrics are designed by considering the category independence and diversity of
composable relation triplets, respectively. We compare the proposed metrics
with the de facto standard metrics through extensive experiments and discuss
the solutions to evaluate unbiased SGG in a more trustworthy way.
Related papers
- Mitigating Spurious Correlations via Disagreement Probability [4.8884049398279705]
Models trained with empirical risk minimization (ERM) are prone to be biased towards spurious correlations between target labels and bias attributes.
We introduce a training objective designed to robustly enhance model performance across all data samples.
We then derive a debiasing method, Disagreement Probability based Resampling for debiasing (DPR), which does not require bias labels.
arXiv Detail & Related papers (2024-11-04T02:44:04Z) - Unsupervised Concept Discovery Mitigates Spurious Correlations [45.48778210340187]
Models prone to spurious correlations in training data often produce brittle predictions and introduce unintended biases.
In this paper, we establish a novel connection between unsupervised object-centric learning and mitigation of spurious correlations.
We introduce CoBalT: a concept balancing technique that effectively mitigates spurious correlations without requiring human labeling of subgroups.
arXiv Detail & Related papers (2024-02-20T20:48:00Z) - TIC-TAC: A Framework for Improved Covariance Estimation in Deep Heteroscedastic Regression [109.69084997173196]
Deepscedastic regression involves jointly optimizing the mean and covariance of the predicted distribution using the negative log-likelihood.
Recent works show that this may result in sub-optimal convergence due to the challenges associated with covariance estimation.
We study two questions: (1) Does the predicted covariance truly capture the randomness of the predicted mean?
Our results show that not only does TIC accurately learn the covariance, it additionally facilitates an improved convergence of the negative log-likelihood.
arXiv Detail & Related papers (2023-10-29T09:54:03Z) - Compositional Feature Augmentation for Unbiased Scene Graph Generation [28.905732042942066]
Scene Graph Generation (SGG) aims to detect all the visual relation triplets sub, pred, obj> in a given image.
Due to the ubiquitous long-tailed predicate distributions, today's SGG models are still easily biased to the head predicates.
We propose a novel Compositional Feature Augmentation (CFA) strategy, which is the first unbiased SGG work to mitigate the bias issue.
arXiv Detail & Related papers (2023-08-13T08:02:14Z) - Balanced Classification: A Unified Framework for Long-Tailed Object
Detection [74.94216414011326]
Conventional detectors suffer from performance degradation when dealing with long-tailed data due to a classification bias towards the majority head categories.
We introduce a unified framework called BAlanced CLassification (BACL), which enables adaptive rectification of inequalities caused by disparities in category distribution.
BACL consistently achieves performance improvements across various datasets with different backbones and architectures.
arXiv Detail & Related papers (2023-08-04T09:11:07Z) - NICEST: Noisy Label Correction and Training for Robust Scene Graph Generation [65.78472854070316]
We propose a novel NoIsy label CorrEction and Sample Training strategy for SGG: NICEST.
NICE first detects noisy samples and then reassigns them more high-quality soft predicate labels.
NICEST can be seamlessly incorporated into any SGG architecture to boost its performance on different predicate categories.
arXiv Detail & Related papers (2022-07-27T06:25:47Z) - CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator [60.799183326613395]
We propose an unbiased estimator for categorical random variables based on multiple mutually negatively correlated (jointly antithetic) samples.
CARMS combines REINFORCE with copula based sampling to avoid duplicate samples and reduce its variance, while keeping the estimator unbiased using importance sampling.
We evaluate CARMS on several benchmark datasets on a generative modeling task, as well as a structured output prediction task, and find it to outperform competing methods including a strong self-control baseline.
arXiv Detail & Related papers (2021-10-26T20:14:30Z) - PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph
Generation [58.98802062945709]
We propose a novel Predicate-Correlation Perception Learning scheme to adaptively seek out appropriate loss weights.
Our PCPL framework is further equipped with a graph encoder module to better extract context features.
arXiv Detail & Related papers (2020-09-02T08:30:09Z) - Tackling the Unannotated: Scene Graph Generation with Bias-Reduced
Models [8.904910414410855]
State-of-the-art results are still far from satisfactory, e.g. models can obtain 31% in overall recall R@100.
We propose a novel SGG training scheme that capitalizes on self-learned knowledge.
arXiv Detail & Related papers (2020-08-18T10:04:51Z) - Learning from Aggregate Observations [82.44304647051243]
We study the problem of learning from aggregate observations where supervision signals are given to sets of instances.
We present a general probabilistic framework that accommodates a variety of aggregate observations.
Simple maximum likelihood solutions can be applied to various differentiable models.
arXiv Detail & Related papers (2020-04-14T06:18:50Z) - AMR Similarity Metrics from Principles [21.915057426589748]
We establish criteria that enable researchers to perform a principled assessment of metrics comparing meaning representations like AMR.
We propose a novel metric S$2$match that is more benevolent to only very slight meaning deviations and targets the fulfilment of all established criteria.
arXiv Detail & Related papers (2020-01-29T16:19:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.