Are Convolutional Neural Networks or Transformers more like human
vision?
- URL: http://arxiv.org/abs/2105.07197v1
- Date: Sat, 15 May 2021 10:33:35 GMT
- Title: Are Convolutional Neural Networks or Transformers more like human
vision?
- Authors: Shikhar Tuli, Ishita Dasgupta, Erin Grant, Thomas L. Griffiths
- Abstract summary: We show that attention-based networks can achieve higher accuracy than CNNs on vision tasks.
These results have implications both for building more human-like vision models, as well as for understanding visual object recognition in humans.
- Score: 9.83454308668432
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern machine learning models for computer vision exceed humans in accuracy
on specific visual recognition tasks, notably on datasets like ImageNet.
However, high accuracy can be achieved in many ways. The particular decision
function found by a machine learning system is determined not only by the data
to which the system is exposed, but also the inductive biases of the model,
which are typically harder to characterize. In this work, we follow a recent
trend of in-depth behavioral analyses of neural network models that go beyond
accuracy as an evaluation metric by looking at patterns of errors. Our focus is
on comparing a suite of standard Convolutional Neural Networks (CNNs) and a
recently-proposed attention-based network, the Vision Transformer (ViT), which
relaxes the translation-invariance constraint of CNNs and therefore represents
a model with a weaker set of inductive biases. Attention-based networks have
previously been shown to achieve higher accuracy than CNNs on vision tasks, and
we demonstrate, using new metrics for examining error consistency with more
granularity, that their errors are also more consistent with those of humans.
These results have implications both for building more human-like vision
models, as well as for understanding visual object recognition in humans.
Related papers
- Biased Attention: Do Vision Transformers Amplify Gender Bias More than
Convolutional Neural Networks? [2.8391805742728553]
Deep neural networks used in computer vision have been shown to exhibit many social biases such as gender bias.
Vision Transformers (ViTs) have become increasingly popular in computer vision applications, outperforming Convolutional Neural Networks (CNNs) in many tasks such as image classification.
This research found that ViTs amplified gender bias to a greater extent than CNNs.
arXiv Detail & Related papers (2023-09-15T20:59:12Z) - Scale Alone Does not Improve Mechanistic Interpretability in Vision Models [16.020535763297175]
Machine vision has seen remarkable progress by scaling neural networks to unprecedented levels in dataset and model size.
We quantify one form of mechanistic interpretability for a diverse suite of nine models.
None of the investigated state-of-the-art models are easier to interpret than the GoogLeNet model from almost a decade ago.
arXiv Detail & Related papers (2023-07-11T17:56:22Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Connecting metrics for shape-texture knowledge in computer vision [1.7785095623975342]
Deep neural networks remain brittle and susceptible to many changes in the image that do not cause humans to misclassify images.
Part of this different behavior may be explained by the type of features humans and deep neural networks use in vision tasks.
arXiv Detail & Related papers (2023-01-25T14:37:42Z) - NCTV: Neural Clamping Toolkit and Visualization for Neural Network
Calibration [66.22668336495175]
A lack of consideration for neural network calibration will not gain trust from humans.
We introduce the Neural Clamping Toolkit, the first open-source framework designed to help developers employ state-of-the-art model-agnostic calibrated models.
arXiv Detail & Related papers (2022-11-29T15:03:05Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Deep Reinforcement Learning Models Predict Visual Responses in the
Brain: A Preliminary Result [1.0323063834827415]
We use reinforcement learning to train neural network models to play a 3D computer game.
We find that these reinforcement learning models achieve neural response prediction accuracy scores in the early visual areas.
In contrast, the supervised neural network models yield better neural response predictions in the higher visual areas.
arXiv Detail & Related papers (2021-06-18T13:10:06Z) - Leveraging Sparse Linear Layers for Debuggable Deep Networks [86.94586860037049]
We show how fitting sparse linear models over learned deep feature representations can lead to more debuggable neural networks.
The resulting sparse explanations can help to identify spurious correlations, explain misclassifications, and diagnose model biases in vision and language tasks.
arXiv Detail & Related papers (2021-05-11T08:15:25Z) - Malicious Network Traffic Detection via Deep Learning: An Information
Theoretic View [0.0]
We study how homeomorphism affects learned representation of a malware traffic dataset.
Our results suggest that although the details of learned representations and the specific coordinate system defined over the manifold of all parameters differ slightly, the functional approximations are the same.
arXiv Detail & Related papers (2020-09-16T15:37:44Z) - Vulnerability Under Adversarial Machine Learning: Bias or Variance? [77.30759061082085]
We investigate the effect of adversarial machine learning on the bias and variance of a trained deep neural network.
Our analysis sheds light on why the deep neural networks have poor performance under adversarial perturbation.
We introduce a new adversarial machine learning algorithm with lower computational complexity than well-known adversarial machine learning strategies.
arXiv Detail & Related papers (2020-08-01T00:58:54Z) - Neural Additive Models: Interpretable Machine Learning with Neural Nets [77.66871378302774]
Deep neural networks (DNNs) are powerful black-box predictors that have achieved impressive performance on a wide variety of tasks.
We propose Neural Additive Models (NAMs) which combine some of the expressivity of DNNs with the inherent intelligibility of generalized additive models.
NAMs learn a linear combination of neural networks that each attend to a single input feature.
arXiv Detail & Related papers (2020-04-29T01:28:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.