Related papers: On Privileged and Convergent Bases in Neural Network Representations

On Privileged and Convergent Bases in Neural Network Representations

URL: http://arxiv.org/abs/2307.12941v1
Date: Mon, 24 Jul 2023 17:11:39 GMT
Title: On Privileged and Convergent Bases in Neural Network Representations
Authors: Davis Brown, Nikhil Vyas, Yamini Bansal
Abstract summary: We show that even in wide networks such as WideResNets, neural networks do not converge to a unique basis. We also analyze Linear Mode Connectivity, which has been studied as a measure of basis correlation.
Score: 7.888192939262696
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this study, we investigate whether the representations learned by neural networks possess a privileged and convergent basis. Specifically, we examine the significance of feature directions represented by individual neurons. First, we establish that arbitrary rotations of neural representations cannot be inverted (unlike linear networks), indicating that they do not exhibit complete rotational invariance. Subsequently, we explore the possibility of multiple bases achieving identical performance. To do this, we compare the bases of networks trained with the same parameters but with varying random initializations. Our study reveals two findings: (1) Even in wide networks such as WideResNets, neural networks do not converge to a unique basis; (2) Basis correlation increases significantly when a few early layers of the network are frozen identically. Furthermore, we analyze Linear Mode Connectivity, which has been studied as a measure of basis correlation. Our findings give evidence that while Linear Mode Connectivity improves with increased network width, this improvement is not due to an increase in basis correlation.

Related papers

Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations [54.17275171325324]
We present a counterexample to the Linear Representation Hypothesis (LRH) When trained to repeat an input token sequence, neural networks learn to represent the token at each position with a particular order of magnitude, rather than a direction. These findings strongly indicate that interpretability research should not be confined to the LRH.
arXiv Detail & Related papers (2024-08-20T15:04:37Z)
Deep Neural Networks Can Learn Generalizable Same-Different Visual Relations [22.205838756057314]
We study whether deep neural networks can acquire and generalize same-different relations both within and out-of-distribution. We find that certain pretrained transformers can learn a same-different relation that generalizes with near perfect accuracy to out-of-distribution stimuli.
arXiv Detail & Related papers (2023-10-14T16:28:57Z)
Decorrelating neurons using persistence [29.25969187808722]
We present two regularisation terms computed from the weights of a minimum spanning tree of a clique. We demonstrate that naive minimisation of all correlations between neurons obtains lower accuracies than our regularisation terms. We include a proof of differentiability of our regularisers, thus developing the first effective topological persistence-based regularisation terms.
arXiv Detail & Related papers (2023-08-09T11:09:14Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
Neural networks trained with SGD learn distributions of increasing complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics. We then exploit higher-order statistics only later during training. We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Similarity and Matching of Neural Network Representations [0.0]
We employ a toolset -- dubbed Dr. Frankenstein -- to analyse the similarity of representations in deep neural networks. We aim to match the activations on given layers of two trained neural networks by joining them with a stitching layer.
arXiv Detail & Related papers (2021-10-27T17:59:46Z)
Redundant representations help generalization in wide neural networks [71.38860635025907]
We study the last hidden layer representations of various state-of-the-art convolutional neural networks. We find that if the last hidden representation is wide enough, its neurons tend to split into groups that carry identical information, and differ from each other only by statistically independent noise.
arXiv Detail & Related papers (2021-06-07T10:18:54Z)
Convergence Analysis of Over-parameterized Deep Linear Networks, and the Principal Components Bias [14.46779433267854]
Convolutional Neural networks learn to classify images in the same order. We show how the PC-bias streamlines the order of learning of both linear and non-linear networks.
arXiv Detail & Related papers (2021-05-12T10:08:42Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.