Related papers: Task structure and nonlinearity jointly determine learned representational geometry

Task structure and nonlinearity jointly determine learned representational geometry

URL: http://arxiv.org/abs/2401.13558v1
Date: Wed, 24 Jan 2024 16:14:38 GMT
Title: Task structure and nonlinearity jointly determine learned representational geometry
Authors: Matteo Alleman, Jack W Lindsey, Stefano Fusi
Abstract summary: We show that Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs. Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The utility of a learned neural representation depends on how well its geometry supports performance in downstream tasks. This geometry depends on the structure of the inputs, the structure of the target outputs, and the architecture of the network. By studying the learning dynamics of networks with one hidden layer, we discovered that the network's activation function has an unexpectedly strong impact on the representational geometry: Tanh networks tend to learn representations that reflect the structure of the target outputs, while ReLU networks retain more information about the structure of the raw inputs. This difference is consistently observed across a broad class of parameterized tasks in which we modulated the degree of alignment between the geometry of the task inputs and that of the task labels. We analyzed the learning dynamics in weight space and show how the differences between the networks with Tanh and ReLU nonlinearities arise from the asymmetric asymptotic behavior of ReLU, which leads feature neurons to specialize for different regions of input space. By contrast, feature neurons in Tanh networks tend to inherit the task label structure. Consequently, when the target outputs are low dimensional, Tanh networks generate neural representations that are more disentangled than those obtained with a ReLU nonlinearity. Our findings shed light on the interplay between input-output geometry, nonlinearity, and learned representations in neural networks.

Related papers

Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks. We show that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z)
Graph Metanetworks for Processing Diverse Neural Architectures [33.686728709734105]
Graph Metanetworks (GMNs) generalizes to neural architectures where competing methods struggle. We prove that GMNs are expressive and equivariant to parameter permutation symmetries that leave the input neural network functions.
arXiv Detail & Related papers (2023-12-07T18:21:52Z)
Variation Spaces for Multi-Output Neural Networks: Insights on Multi-Task Learning and Network Compression [28.851519959657466]
This paper introduces a novel theoretical framework for the analysis of vector-valued neural networks. A key contribution of this work is the development of a representer theorem for the vector-valued variation spaces. This observation reveals that the norm associated with these vector-valued variation spaces encourages the learning of features that are useful for multiple tasks.
arXiv Detail & Related papers (2023-05-25T23:32:10Z)
Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation [55.80128181112308]
We show that dimensionality and quasi-orthogonality of neural networks' feature space may jointly serve as network's performance discriminants. Our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces.
arXiv Detail & Related papers (2022-03-30T21:47:32Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Neural networks adapting to datasets: learning network size and topology [77.34726150561087]
We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a gradient-based training. The resulting network has the structure of a graph tailored to the particular learning task and dataset.
arXiv Detail & Related papers (2020-06-22T12:46:44Z)
Internal representation dynamics and geometry in recurrent neural networks [10.016265742591674]
We show how a vanilla RNN implements a simple classification task by analysing the dynamics of the network. We find that early internal representations are evocative of the real labels of the data but this information is not directly accessible to the output layer.
arXiv Detail & Related papers (2020-01-09T23:19:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.