On the Bias Against Inductive Biases
- URL: http://arxiv.org/abs/2105.14077v1
- Date: Fri, 28 May 2021 19:41:48 GMT
- Title: On the Bias Against Inductive Biases
- Authors: George Cazenavette, Simon Lucey
- Abstract summary: Self-supervised feature learning for visual tasks has seen state-of-the-art success using these extremely deep, isotropic networks.
In this work, we analyze the effect of inductive biases on small to moderately-sized isotropic networks used for unsupervised visual feature learning.
- Score: 34.10348216388905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Borrowing from the transformer models that revolutionized the field of
natural language processing, self-supervised feature learning for visual tasks
has also seen state-of-the-art success using these extremely deep, isotropic
networks. However, the typical AI researcher does not have the resources to
evaluate, let alone train, a model with several billion parameters and
quadratic self-attention activations. To facilitate further research, it is
necessary to understand the features of these huge transformer models that can
be adequately studied by the typical researcher. One interesting characteristic
of these transformer models is that they remove most of the inductive biases
present in classical convolutional networks. In this work, we analyze the
effect of these and more inductive biases on small to moderately-sized
isotropic networks used for unsupervised visual feature learning and show that
their removal is not always ideal.
Related papers
- MAGIC: Modular Auto-encoder for Generalisable Model Inversion with Bias Corrections [0.19241821314180374]
We replace the decoder stage of a standard autoencoder with a physical model followed by a bias-correction layer.
This generalisable approach simultaneously inverts the model and corrects its biases in an end-to-end manner without making strong assumptions about the nature of the biases.
Our method matches or surpasses results from classical approaches without requiring biases to be explicitly filtered out.
arXiv Detail & Related papers (2024-05-29T10:11:10Z) - Multi-Dimensional Hyena for Spatial Inductive Bias [69.3021852589771]
We present a data-efficient vision transformer that does not rely on self-attention.
Instead, it employs a novel generalization to multiple axes of the very recent Hyena layer.
We show that a hybrid approach that is based on Hyena N-D for the first layers in ViT, followed by layers that incorporate conventional attention, consistently boosts the performance of various vision transformer architectures.
arXiv Detail & Related papers (2023-09-24T10:22:35Z) - Exploring Model Transferability through the Lens of Potential Energy [78.60851825944212]
Transfer learning has become crucial in computer vision tasks due to the vast availability of pre-trained deep learning models.
Existing methods for measuring the transferability of pre-trained models rely on statistical correlations between encoded static features and task labels.
We present an insightful physics-inspired approach named PED to address these challenges.
arXiv Detail & Related papers (2023-08-29T07:15:57Z) - Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures [93.17009514112702]
Pruning, setting a significant subset of the parameters of a neural network to zero, is one of the most popular methods of model compression.
Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood.
arXiv Detail & Related papers (2023-04-25T07:42:06Z) - Beyond Transformers for Function Learning [0.6768558752130311]
The ability to learn and predict simple functions is a key aspect of human intelligence.
Recent works have started to explore this ability using transformer architectures.
We propose to address this gap by augmenting the transformer architecture with two simple inductive learning biases.
arXiv Detail & Related papers (2023-04-19T21:33:06Z) - Attention Enables Zero Approximation Error [22.110336842946555]
We show that a single-head self-attention transformer with a fixed number of transformer encoder blocks and free parameters is able to generate any desired encoder of the input with no error.
As a direct consequence, we show that the single-head self-attention transformer with increasing numbers of free parameters is universal.
arXiv Detail & Related papers (2022-02-24T16:06:01Z) - Human Interpretation and Exploitation of Self-attention Patterns in
Transformers: A Case Study in Extractive Summarization [9.42402875164615]
This paper synergize two lines of research in a human-in-the-loop pipeline to first find important task-specific attention patterns.
Then those patterns are applied, not only to the original model, but also to smaller models.
Experiments indicate that when we inject such patterns, both the original and the smaller model show improvements in performance and arguably interpretability.
arXiv Detail & Related papers (2021-12-10T07:15:09Z) - Counterfactual Generative Networks [59.080843365828756]
We propose to decompose the image generation process into independent causal mechanisms that we train without direct supervision.
By exploiting appropriate inductive biases, these mechanisms disentangle object shape, object texture, and background.
We show that the counterfactual images can improve out-of-distribution with a marginal drop in performance on the original classification task.
arXiv Detail & Related papers (2021-01-15T10:23:12Z) - A Survey on Visual Transformer [126.56860258176324]
Transformer is a type of deep neural network mainly based on the self-attention mechanism.
In this paper, we review these vision transformer models by categorizing them in different tasks and analyzing their advantages and disadvantages.
arXiv Detail & Related papers (2020-12-23T09:37:54Z) - Effects of Parameter Norm Growth During Transformer Training: Inductive
Bias from Gradient Descent [44.44543743806831]
We study the tendency for transformer parameters to grow in magnitude while saturated between these norms during training.
As the parameters grow in magnitude, we prove that the network approximates a discretized network with saturated activation functions.
Our results suggest saturation is a new characterization of an inductive bias implicit in GD of particular interest for NLP.
arXiv Detail & Related papers (2020-10-19T17:40:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.