Related papers: An Analysis Framework for Understanding Deep Neural Networks Based on Network Dynamics

An Analysis Framework for Understanding Deep Neural Networks Based on Network Dynamics

URL: http://arxiv.org/abs/2501.02436v1
Date: Sun, 05 Jan 2025 04:23:21 GMT
Title: An Analysis Framework for Understanding Deep Neural Networks Based on Network Dynamics
Authors: Yuchen Lin, Yong Zhang, Sihan Feng, Hong Zhao,
Abstract summary: Deep neural networks (DNNs) maximize information extraction by rationally allocating the proportion of neurons in different modes across deep layers. This framework provides a unified explanation for fundamental DNN behaviors such as the "flat minima effect," "grokking," and double descent phenomena.
Score: 11.44947569206928
License:
Abstract: Advancing artificial intelligence demands a deeper understanding of the mechanisms underlying deep learning. Here, we propose a straightforward analysis framework based on the dynamics of learning models. Neurons are categorized into two modes based on whether their transformation functions preserve order. This categorization reveals how deep neural networks (DNNs) maximize information extraction by rationally allocating the proportion of neurons in different modes across deep layers. We further introduce the attraction basins of the training samples in both the sample vector space and the weight vector space to characterize the generalization ability of DNNs. This framework allows us to identify optimal depth and width configurations, providing a unified explanation for fundamental DNN behaviors such as the "flat minima effect," "grokking," and double descent phenomena. Our analysis extends to networks with depths up to 100 layers.

Related papers

Convergence Analysis for Deep Sparse Coding via Convolutional Neural Networks [7.956678963695681]
We explore intersections between sparse coding and deep learning to enhance our understanding of feature extraction capabilities. We derive convergence rates for convolutional neural networks (CNNs) in their ability to extract sparse features. Inspired by the strong connection between sparse coding and CNNs, we explore training strategies to encourage neural networks to learn more sparse features.
arXiv Detail & Related papers (2024-08-10T12:43:55Z)
Unveiling the Unseen: Identifiable Clusters in Trained Depthwise Convolutional Kernels [56.69755544814834]
Recent advances in depthwise-separable convolutional neural networks (DS-CNNs) have led to novel architectures. This paper reveals another striking property of DS-CNN architectures: discernible and explainable patterns emerge in their trained depthwise convolutional kernels in all layers.
arXiv Detail & Related papers (2024-01-25T19:05:53Z)
Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z)
Deep Neural Networks as Complex Networks [1.704936863091649]
We use Complex Network Theory to represent Deep Neural Networks (DNNs) as directed weighted graphs. We introduce metrics to study DNNs as dynamical systems, with a granularity that spans from weights to layers, including neurons. We show that our metrics discriminate low vs. high performing networks.
arXiv Detail & Related papers (2022-09-12T16:26:04Z)
Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs. By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z)
Characterizing Learning Dynamics of Deep Neural Networks via Complex Networks [1.0869257688521987]
Complex Network Theory (CNT) represents Deep Neural Networks (DNNs) as directed weighted graphs to study them as dynamical systems. We introduce metrics for nodes/neurons and layers, namely Nodes Strength and Layers Fluctuation. Our framework distills trends in the learning dynamics and separates low from high accurate networks.
arXiv Detail & Related papers (2021-10-06T10:03:32Z)
A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation. Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z)
SGD Distributional Dynamics of Three Layer Neural Networks [7.025709586759655]
In paper, we seek to extend the mean field results of Mei et al. from two neural networks with one hidden layer to three neural networks with two hidden layers. We will show that the SGD is captured by a set of non-linear differential equations, and prove that distributions of dynamics in the two layers are independent.
arXiv Detail & Related papers (2020-12-30T04:37:09Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
Complexity for deep neural networks and other characteristics of deep feature representations [0.0]
We define a notion of complexity, which quantifies the nonlinearity of the computation of a neural network. We investigate these observables both for trained networks as well as explore their dynamics during training.
arXiv Detail & Related papers (2020-06-08T17:59:30Z)
Rectified Linear Postsynaptic Potential Function for Backpropagation in Deep Spiking Neural Networks [55.0627904986664]
Spiking Neural Networks (SNNs) usetemporal spike patterns to represent and transmit information, which is not only biologically realistic but also suitable for ultra-low-power event-driven neuromorphic implementation. This paper investigates the contribution of spike timing dynamics to information encoding, synaptic plasticity and decision making, providing a new perspective to design of future DeepSNNs and neuromorphic hardware systems.
arXiv Detail & Related papers (2020-03-26T11:13:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.