Attribute Diversity Determines the Systematicity Gap in VQA
- URL: http://arxiv.org/abs/2311.08695v2
- Date: Mon, 24 Jun 2024 15:51:13 GMT
- Title: Attribute Diversity Determines the Systematicity Gap in VQA
- Authors: Ian Berlot-Attwell, Kumar Krishna Agrawal, A. Michael Carrell, Yash Sharma, Naomi Saphra,
- Abstract summary: We study the systematicity gap in visual question answering.
We find that increased quantity of training data does not reduce the systematicity gap.
In all, our experiments suggest that the more distinct attribute type combinations are seen during training, the more systematic we can expect the resulting model to be.
- Score: 7.433031036510163
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The degree to which neural networks can generalize to new combinations of familiar concepts, and the conditions under which they are able to do so, has long been an open question. In this work, we study the systematicity gap in visual question answering: the performance difference between reasoning on previously seen and unseen combinations of object attributes. To test, we introduce a novel diagnostic dataset, CLEVR-HOPE. We find that while increased quantity of training data does not reduce the systematicity gap, increased training data diversity of the attributes in the unseen combination does. In all, our experiments suggest that the more distinct attribute type combinations are seen during training, the more systematic we can expect the resulting model to be.
Related papers
- Bayesian Joint Additive Factor Models for Multiview Learning [7.254731344123118]
A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes.
We propose a joint additive factor regression model (JAFAR) with a structured additive design, accounting for shared and view-specific components.
Prediction of time-to-labor onset from immunome, metabolome, and proteome data illustrates performance gains against state-of-the-art competitors.
arXiv Detail & Related papers (2024-06-02T15:35:45Z) - D3: Data Diversity Design for Systematic Generalization in Visual
Question Answering [6.392972407599867]
We show that the diversity of simple tasks plays a key role in achieving systematic generalization.
This implies that it may not be essential to gather a large and varied number of complex tasks, which could be costly to obtain.
arXiv Detail & Related papers (2023-09-15T22:45:02Z) - Rethinking Mitosis Detection: Towards Diverse Data and Feature
Representation [30.882319057927052]
We propose a novel generalizable framework (MitDet) for mitosis detection.
Our proposed model outperforms all the SOTA approaches in several popular mitosis detection datasets.
arXiv Detail & Related papers (2023-07-12T03:33:11Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - A Comprehensive Survey and Performance Analysis of Activation Functions
in Deep Learning [23.83339228535986]
Various types of neural networks have been introduced to deal with different types of problems.
The main goal of any neural network is to transform the non-linearly separable input data into more linearly separable abstract features.
The most popular and common non-linearity layers are activation functions (AFs), such as Logistic Sigmoid, Tanh, ReLU, ELU, Swish and Mish.
arXiv Detail & Related papers (2021-09-29T16:41:19Z) - Learning Compositional Representation for Few-shot Visual Question
Answering [93.4061107793983]
Current methods of Visual Question Answering perform well on the answers with an amount of training data but have limited accuracy on the novel ones with few examples.
We propose to extract the attributes from the answers with enough data, which are later composed to constrain the learning of the few-shot ones.
Experimental results on the VQA v2.0 validation dataset demonstrate the effectiveness of our proposed attribute network.
arXiv Detail & Related papers (2021-02-21T10:16:24Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Dataset Bias in Few-shot Image Recognition [57.25445414402398]
We first investigate the impact of transferable capabilities learned from base categories.
Second, we investigate performance differences on different datasets from dataset structures and different few-shot learning methods.
arXiv Detail & Related papers (2020-08-18T14:46:23Z) - Neural Additive Models: Interpretable Machine Learning with Neural Nets [77.66871378302774]
Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks.
We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models.
NAMs learn a linear combination of neural networks that each attend to a single input feature.
arXiv Detail & Related papers (2020-04-29T01:28:32Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.