Task structure and nonlinearity jointly determine learned
representational geometry
- URL: http://arxiv.org/abs/2401.13558v1
- Date: Wed, 24 Jan 2024 16:14:38 GMT
- Title: Task structure and nonlinearity jointly determine learned
representational geometry
- Authors: Matteo Alleman, Jack W Lindsey, Stefano Fusi
- Abstract summary: We show that Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs.
Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The utility of a learned neural representation depends on how well its
geometry supports performance in downstream tasks. This geometry depends on the
structure of the inputs, the structure of the target outputs, and the
architecture of the network. By studying the learning dynamics of networks with
one hidden layer, we discovered that the network's activation function has an
unexpectedly strong impact on the representational geometry: Tanh networks tend
to learn representations that reflect the structure of the target outputs,
while ReLU networks retain more information about the structure of the raw
inputs. This difference is consistently observed across a broad class of
parameterized tasks in which we modulated the degree of alignment between the
geometry of the task inputs and that of the task labels. We analyzed the
learning dynamics in weight space and show how the differences between the
networks with Tanh and ReLU nonlinearities arise from the asymmetric asymptotic
behavior of ReLU, which leads feature neurons to specialize for different
regions of input space. By contrast, feature neurons in Tanh networks tend to
inherit the task label structure. Consequently, when the target outputs are low
dimensional, Tanh networks generate neural representations that are more
disentangled than those obtained with a ReLU nonlinearity. Our findings shed
light on the interplay between input-output geometry, nonlinearity, and learned
representations in neural networks.
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Graph Metanetworks for Processing Diverse Neural Architectures [33.686728709734105]
Graph Metanetworks (GMNs) generalizes to neural architectures where competing methods struggle.
We prove that GMNs are expressive and equivariant to parameter permutation symmetries that leave the input neural network functions.
arXiv Detail & Related papers (2023-12-07T18:21:52Z) - Variation Spaces for Multi-Output Neural Networks: Insights on Multi-Task Learning and Network Compression [28.851519959657466]
This paper introduces a novel theoretical framework for the analysis of vector-valued neural networks.
A key contribution of this work is the development of a representer theorem for the vector-valued variation spaces.
This observation reveals that the norm associated with these vector-valued variation spaces encourages the learning of features that are useful for multiple tasks.
arXiv Detail & Related papers (2023-05-25T23:32:10Z) - Quasi-orthogonality and intrinsic dimensions as measures of learning and
generalisation [55.80128181112308]
We show that dimensionality and quasi-orthogonality of neural networks' feature space may jointly serve as network's performance discriminants.
Our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces.
arXiv Detail & Related papers (2022-03-30T21:47:32Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Neural networks adapting to datasets: learning network size and topology [77.34726150561087]
We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a gradient-based training.
The resulting network has the structure of a graph tailored to the particular learning task and dataset.
arXiv Detail & Related papers (2020-06-22T12:46:44Z) - Internal representation dynamics and geometry in recurrent neural
networks [10.016265742591674]
We show how a vanilla RNN implements a simple classification task by analysing the dynamics of the network.
We find that early internal representations are evocative of the real labels of the data but this information is not directly accessible to the output layer.
arXiv Detail & Related papers (2020-01-09T23:19:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.