Automatic Generation of Contrast Sets from Scene Graphs: Probing the
Compositional Consistency of GQA
- URL: http://arxiv.org/abs/2103.09591v1
- Date: Wed, 17 Mar 2021 12:19:25 GMT
- Title: Automatic Generation of Contrast Sets from Scene Graphs: Probing the
Compositional Consistency of GQA
- Authors: Yonatan Bitton, Gabriel Stanovsky, Roy Schwartz, Michael Elhadad
- Abstract summary: supervised models often exploit data artifacts to achieve good test scores while their performance severely degrades on samples outside their training distribution.
We present a novel method which leverages rich semantic input representation to automatically generate contrast sets for the visual question answering task.
We find that, despite GQA's compositionality and carefully balanced label distribution, two high-performing models drop 13-17% in accuracy compared to the original test set.
- Score: 16.95631509102115
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent works have shown that supervised models often exploit data artifacts
to achieve good test scores while their performance severely degrades on
samples outside their training distribution. Contrast sets (Gardneret al.,
2020) quantify this phenomenon by perturbing test samples in a minimal way such
that the output label is modified. While most contrast sets were created
manually, requiring intensive annotation effort, we present a novel method
which leverages rich semantic input representation to automatically generate
contrast sets for the visual question answering task. Our method computes the
answer of perturbed questions, thus vastly reducing annotation cost and
enabling thorough evaluation of models' performance on various semantic aspects
(e.g., spatial or relational reasoning). We demonstrate the effectiveness of
our approach on the GQA dataset and its semantic scene graph image
representation. We find that, despite GQA's compositionality and carefully
balanced label distribution, two high-performing models drop 13-17% in accuracy
compared to the original test set. Finally, we show that our automatic
perturbation can be applied to the training set to mitigate the degradation in
performance, opening the door to more robust models.
Related papers
- Vision-Language Models are Strong Noisy Label Detectors [76.07846780815794]
This paper presents a Denoising Fine-Tuning framework, called DeFT, for adapting vision-language models.
DeFT utilizes the robust alignment of textual and visual features pre-trained on millions of auxiliary image-text pairs to sieve out noisy labels.
Experimental results on seven synthetic and real-world noisy datasets validate the effectiveness of DeFT in both noisy label detection and image classification.
arXiv Detail & Related papers (2024-09-29T12:55:17Z) - Unsupervised Contrastive Analysis for Salient Pattern Detection using Conditional Diffusion Models [13.970483987621135]
Contrastive Analysis (CA) aims to identify patterns in images that allow distinguishing between a background (BG) dataset and a target (TG) dataset (i.e. unhealthy subjects)
Recent works on this topic rely on variational autoencoders (VAE) or contrastive learning strategies to learn the patterns that separate TG samples from BG samples in a supervised manner.
We employ a self-supervised contrastive encoder to learn a latent representation encoding only common patterns from input images, using samples exclusively from the BG dataset during training, and approximating the distribution of the target patterns by leveraging data augmentation techniques.
arXiv Detail & Related papers (2024-06-02T15:19:07Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Mitigating Exposure Bias in Discriminator Guided Diffusion Models [4.5349436061325425]
We propose SEDM-G++, which incorporates a modified sampling approach, combining Discriminator Guidance and Epsilon Scaling.
Our proposed approach outperforms the current state-of-the-art, by achieving an FID score of 1.73 on the unconditional CIFAR-10 dataset.
arXiv Detail & Related papers (2023-11-18T20:49:50Z) - Counterfactual Image Generation for adversarially robust and
interpretable Classifiers [1.3859669037499769]
We propose a unified framework leveraging image-to-image translation Generative Adrial Networks (GANs) to produce counterfactual samples.
This is achieved by combining the classifier and discriminator into a single model that attributes real images to their respective classes and flags generated images as "fake"
We show how the model exhibits improved robustness to adversarial attacks, and we show how the discriminator's "fakeness" value serves as an uncertainty measure of the predictions.
arXiv Detail & Related papers (2023-10-01T18:50:29Z) - Semi-Supervised Learning for hyperspectral images by non parametrically
predicting view assignment [25.198550162904713]
Hyperspectral image (HSI) classification is gaining a lot of momentum in present time because of high inherent spectral information within the images.
Recently, to effectively train the deep learning models with minimal labelled samples, the unlabeled samples are also being leveraged in self-supervised and semi-supervised setting.
In this work, we leverage the idea of semi-supervised learning to assist the discriminative self-supervised pretraining of the models.
arXiv Detail & Related papers (2023-06-19T14:13:56Z) - Masked Images Are Counterfactual Samples for Robust Fine-tuning [77.82348472169335]
Fine-tuning deep learning models can lead to a trade-off between in-distribution (ID) performance and out-of-distribution (OOD) robustness.
We propose a novel fine-tuning method, which uses masked images as counterfactual samples that help improve the robustness of the fine-tuning model.
arXiv Detail & Related papers (2023-03-06T11:51:28Z) - Test-time Adaptation with Slot-Centric Models [63.981055778098444]
Slot-TTA is a semi-supervised scene decomposition model that at test time is adapted per scene through gradient descent on reconstruction or cross-view synthesis objectives.
We show substantial out-of-distribution performance improvements against state-of-the-art supervised feed-forward detectors, and alternative test-time adaptation methods.
arXiv Detail & Related papers (2022-03-21T17:59:50Z) - Active Learning by Feature Mixing [52.16150629234465]
We propose a novel method for batch active learning called ALFA-Mix.
We identify unlabelled instances with sufficiently-distinct features by seeking inconsistencies in predictions.
We show that inconsistencies in these predictions help discovering features that the model is unable to recognise in the unlabelled instances.
arXiv Detail & Related papers (2022-03-14T12:20:54Z) - Evaluating Models' Local Decision Boundaries via Contrast Sets [119.38387782979474]
We propose a new annotation paradigm for NLP that helps to close systematic gaps in the test data.
We demonstrate the efficacy of contrast sets by creating them for 10 diverse NLP datasets.
Although our contrast sets are not explicitly adversarial, model performance is significantly lower on them than on the original test sets.
arXiv Detail & Related papers (2020-04-06T14:47:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.