Related papers: Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear Activations

Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear Activations

URL: http://arxiv.org/abs/2410.20107v2
Date: Tue, 29 Oct 2024 07:52:19 GMT
Title: Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear Activations
Authors: Amir Joudaki, Thomas Hofmann,
Abstract summary: We introduce a theoretical framework for the evolution of the kernel sequence, which measures the similarity between the hidden representation for two different inputs. For nonlinear activations, the kernel sequence converges globally to a unique fixed point, which can correspond to similar representations depending on the activation and network architecture. This work provides new insights into the implicit biases of deep neural networks and how architectural choices influence the evolution of representations across layers.
Score: 24.052411316664017
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding how neural networks transform input data across layers is fundamental to unraveling their learning and generalization capabilities. Although prior work has used insights from kernel methods to study neural networks, a global analysis of how the similarity between hidden representations evolves across layers remains underexplored. In this paper, we introduce a theoretical framework for the evolution of the kernel sequence, which measures the similarity between the hidden representation for two different inputs. Operating under the mean-field regime, we show that the kernel sequence evolves deterministically via a kernel map, which only depends on the activation function. By expanding activation using Hermite polynomials and using their algebraic properties, we derive an explicit form for kernel map and fully characterize its fixed points. Our analysis reveals that for nonlinear activations, the kernel sequence converges globally to a unique fixed point, which can correspond to orthogonal or similar representations depending on the activation and network architecture. We further extend our results to networks with residual connections and normalization layers, demonstrating similar convergence behaviors. This work provides new insights into the implicit biases of deep neural networks and how architectural choices influence the evolution of representations across layers.

Related papers

A Graph Sufficiency Perspective for Neural Networks [4.872570541276082]
This paper analyzes neural networks through graph variables and statistical sufficiency.<n>We prove that sufficiency holds in the infinite-width limit and is preserved throughout training.<n>This work bridges statistical sufficiency, graph-theoretic representations, and deep learning, providing a new statistical understanding of neural networks.
arXiv Detail & Related papers (2025-07-14T12:31:47Z)
Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization [66.03821840425539]
In this paper, we investigate the training dynamics of $L$-layer neural networks using the tensor gradient program (SGD) framework. We show that SGD enables these networks to learn linearly independent features that substantially deviate from their initial values. This rich feature space captures relevant data information and ensures that any convergent point of the training process is a global minimum.
arXiv Detail & Related papers (2025-03-12T17:33:13Z)
From Kernels to Features: A Multi-Scale Adaptive Theory of Feature Learning [3.7857410821449755]
This work presents a theoretical framework of multi-scale adaptive feature learning bridging different approaches. A systematic expansion of the network's probability distribution reveals that mean-field scaling requires only a saddle-point approximation. Remarkably, we find across regimes that kernel adaptation can be reduced to an effective kernel rescaling when predicting the mean network output of a linear network.
arXiv Detail & Related papers (2025-02-05T14:26:50Z)
Mechanism of feature learning in convolutional neural networks [14.612673151889615]
We identify the mechanism of how convolutional neural networks learn from image data. We present empirical evidence for our ansatz, including identifying high correlation between covariances of filters and patch-based AGOPs. We then demonstrate the generality of our result by using the patch-based AGOP to enable deep feature learning in convolutional kernel machines.
arXiv Detail & Related papers (2023-09-01T16:30:02Z)
Structure Embedded Nucleus Classification for Histopathology Images [51.02953253067348]
Most neural network based methods are affected by the local receptive field of convolutions. We propose a novel polygon-structure feature learning mechanism that transforms a nucleus contour into a sequence of points sampled in order. Next, we convert a histopathology image into a graph structure with nuclei as nodes, and build a graph neural network to embed the spatial distribution of nuclei into their representations.
arXiv Detail & Related papers (2023-02-22T14:52:06Z)
Graph Convolutional Networks from the Perspective of Sheaves and the Neural Tangent Kernel [0.0]
Graph convolutional networks are a popular class of deep neural network algorithms. Despite their success, graph convolutional networks exhibit a number of peculiar features, including a bias towards learning oversmoothed and homophilic functions. We propose to bridge this gap by studying the neural tangent kernel of sheaf convolutional networks.
arXiv Detail & Related papers (2022-08-19T12:46:49Z)
Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers. It is an instance of a key structural condition that applies across broad domains of machine learning. For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z)
Entangled Residual Mappings [59.02488598557491]
We introduce entangled residual mappings to generalize the structure of the residual connections. An entangled residual mapping replaces the identity skip connections with specialized entangled mappings. We show that while entangled mappings can preserve the iterative refinement of features across various deep models, they influence the representation learning process in convolutional networks.
arXiv Detail & Related papers (2022-06-02T19:36:03Z)
Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks [18.27510863075184]
We analyze feature learning in infinite width neural networks trained with gradient flow through a self-consistent dynamical field theory. We construct a collection of deterministic dynamical order parameters which are inner-product kernels for hidden unit activations and gradients in each layer at pairs of time points.
arXiv Detail & Related papers (2022-05-19T16:10:10Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks [18.377136391055327]
This paper theoretically analyzes the implicit regularization in hierarchical tensor factorization. It translates to an implicit regularization towards locality for the associated convolutional networks. Our work highlights the potential of enhancing neural networks via theoretical analysis of their implicit regularization.
arXiv Detail & Related papers (2022-01-27T18:48:30Z)
Defensive Tensorization [113.96183766922393]
We propose tensor defensiveization, an adversarial defence technique that leverages a latent high-order factorization of the network. We empirically demonstrate the effectiveness of our approach on standard image classification benchmarks. We validate the versatility of our approach across domains and low-precision architectures by considering an audio task and binary networks.
arXiv Detail & Related papers (2021-10-26T17:00:16Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Investigating the Compositional Structure Of Deep Neural Networks [1.8899300124593645]
We introduce a novel theoretical framework based on the compositional structure of piecewise linear activation functions. It is possible to characterize the instances of the input data with respect to both the predicted label and the specific (linear) transformation used to perform predictions. Preliminary tests on the MNIST dataset show that our method can group input instances with regard to their similarity in the internal representation of the neural network.
arXiv Detail & Related papers (2020-02-17T14:16:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.