Perceptual Score: What Data Modalities Does Your Model Perceive?
- URL: http://arxiv.org/abs/2110.14375v1
- Date: Wed, 27 Oct 2021 12:19:56 GMT
- Title: Perceptual Score: What Data Modalities Does Your Model Perceive?
- Authors: Itai Gat, Idan Schwartz, Alexander Schwing
- Abstract summary: We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features.
We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors.
Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
- Score: 73.75255606437808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning advances in the last decade have relied significantly on
large-scale datasets that continue to grow in size. Increasingly, those
datasets also contain different data modalities. However, large multi-modal
datasets are hard to annotate, and annotations may contain biases that we are
often unaware of. Deep-net-based classifiers, in turn, are prone to exploit
those biases and to find shortcuts. To study and quantify this concern, we
introduce the perceptual score, a metric that assesses the degree to which a
model relies on the different subsets of the input features, i.e., modalities.
Using the perceptual score, we find a surprisingly consistent trend across four
popular datasets: recent, more accurate state-of-the-art multi-modal models for
visual question-answering or visual dialog tend to perceive the visual data
less than their predecessors. This trend is concerning as answers are hence
increasingly inferred from textual cues only. Using the perceptual score also
helps to analyze model biases by decomposing the score into data subset
contributions. We hope to spur a discussion on the perceptiveness of
multi-modal models and also hope to encourage the community working on
multi-modal classifiers to start quantifying perceptiveness via the proposed
perceptual score.
Related papers
- Corpus Considerations for Annotator Modeling and Scaling [9.263562546969695]
We show that the commonly used user token model consistently outperforms more complex models.
Our findings shed light on the relationship between corpus statistics and annotator modeling performance.
arXiv Detail & Related papers (2024-04-02T22:27:24Z) - Debiasing Multimodal Models via Causal Information Minimization [65.23982806840182]
We study bias arising from confounders in a causal graph for multimodal data.
Robust predictive features contain diverse information that helps a model generalize to out-of-distribution data.
We use these features as confounder representations and use them via methods motivated by causal theory to remove bias from models.
arXiv Detail & Related papers (2023-11-28T16:46:14Z) - What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models [22.0839948292609]
We introduce a novel dataset, C-VQA, specifically designed to test the counterfactual reasoning capabilities of modern language models.
This dataset is constructed by infusing original questions with various types such as numerical and counter-language queries.
Our evaluations of contemporary vision models using this dataset have revealed substantial performance drops, with some models showing up to a 40% decrease.
arXiv Detail & Related papers (2023-10-10T13:45:59Z) - The Trade-off between Universality and Label Efficiency of
Representations from Contrastive Learning [32.15608637930748]
We show that there exists a trade-off between the two desiderata so that one may not be able to achieve both simultaneously.
We provide analysis using a theoretical data model and show that, while more diverse pre-training data result in more diverse features for different tasks, it puts less emphasis on task-specific features.
arXiv Detail & Related papers (2023-02-28T22:14:33Z) - Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis [20.316056261749946]
We propose an end-to-end vision and language model incorporating explicit knowledge graphs.
We also introduce an interactive out-of-distribution layer using implicit network operator.
In practice, we apply our model on several vision and language downstream tasks including visual question answering, visual reasoning, and image-text retrieval.
arXiv Detail & Related papers (2023-02-11T05:46:21Z) - A Graph-Enhanced Click Model for Web Search [67.27218481132185]
We propose a novel graph-enhanced click model (GraphCM) for web search.
We exploit both intra-session and inter-session information for the sparsity and cold-start problems.
arXiv Detail & Related papers (2022-06-17T08:32:43Z) - Visualising Deep Network's Time-Series Representations [93.73198973454944]
Despite the popularisation of machine learning models, more often than not they still operate as black boxes with no insight into what is happening inside the model.
In this paper, a method that addresses that issue is proposed, with a focus on visualising multi-dimensional time-series data.
Experiments on a high-frequency stock market dataset show that the method provides fast and discernible visualisations.
arXiv Detail & Related papers (2021-03-12T09:53:34Z) - Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles [66.15398165275926]
We propose a method that can automatically detect and ignore dataset-specific patterns, which we call dataset biases.
Our method trains a lower capacity model in an ensemble with a higher capacity model.
We show improvement in all settings, including a 10 point gain on the visual question answering dataset.
arXiv Detail & Related papers (2020-11-07T22:20:03Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.