Data-driven effective model shows a liquid-like deep learning
- URL: http://arxiv.org/abs/2007.08093v2
- Date: Wed, 28 Jul 2021 07:36:57 GMT
- Title: Data-driven effective model shows a liquid-like deep learning
- Authors: Wenxuan Zou and Haiping Huang
- Abstract summary: It remains unknown what the landscape looks like for deep networks of binary synapses.
We propose a statistical mechanics framework by directly building a least structured model of the high-dimensional weight space.
Our data-driven model thus provides a statistical mechanics insight about why deep learning is unreasonably effective in terms of the high-dimensional weight space.
- Score: 2.0711789781518752
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The geometric structure of an optimization landscape is argued to be
fundamentally important to support the success of deep neural network learning.
A direct computation of the landscape beyond two layers is hard. Therefore, to
capture the global view of the landscape, an interpretable model of the
network-parameter (or weight) space must be established. However, the model is
lacking so far. Furthermore, it remains unknown what the landscape looks like
for deep networks of binary synapses, which plays a key role in robust and
energy efficient neuromorphic computation. Here, we propose a statistical
mechanics framework by directly building a least structured model of the
high-dimensional weight space, considering realistic structured data,
stochastic gradient descent training, and the computational depth of neural
networks. We also consider whether the number of network parameters outnumbers
the number of supplied training data, namely, over- or under-parametrization.
Our least structured model reveals that the weight spaces of the
under-parametrization and over-parameterization cases belong to the same class,
in the sense that these weight spaces are well-connected without any
hierarchical clustering structure. In contrast, the shallow-network has a
broken weight space, characterized by a discontinuous phase transition, thereby
clarifying the benefit of depth in deep learning from the angle of high
dimensional geometry. Our effective model also reveals that inside a deep
network, there exists a liquid-like central part of the architecture in the
sense that the weights in this part behave as randomly as possible, providing
algorithmic implications. Our data-driven model thus provides a statistical
mechanics insight about why deep learning is unreasonably effective in terms of
the high-dimensional weight space, and how deep networks are different from
shallow ones.
Related papers
- Improved Generalization of Weight Space Networks via Augmentations [53.87011906358727]
Learning in deep weight spaces (DWS) is an emerging research direction, with applications to 2D and 3D neural fields (INRs, NeRFs)
We empirically analyze the reasons for this overfitting and find that a key reason is the lack of diversity in DWS datasets.
To address this, we explore strategies for data augmentation in weight spaces and propose a MixUp method adapted for weight spaces.
arXiv Detail & Related papers (2024-02-06T15:34:44Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Engineering flexible machine learning systems by traversing
functionally-invariant paths [1.4999444543328289]
We introduce a differential geometry framework that provides flexible and continuous adaptation of neural networks.
We formalize adaptation as movement along a geodesic path in weight space while searching for networks that accommodate secondary objectives.
With modest computational resources, the FIP algorithm achieves comparable to state of the art performance on continual learning and sparsification tasks.
arXiv Detail & Related papers (2022-04-30T19:44:56Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Statistical Mechanics of Deep Linear Neural Networks: The
Back-Propagating Renormalization Group [4.56877715768796]
We study the statistical mechanics of learning in Deep Linear Neural Networks (DLNNs) in which the input-output function of an individual unit is linear.
We solve exactly the network properties following supervised learning using an equilibrium Gibbs distribution in the weight space.
Our numerical simulations reveal that despite the nonlinearity, the predictions of our theory are largely shared by ReLU networks with modest depth.
arXiv Detail & Related papers (2020-12-07T20:08:31Z) - KShapeNet: Riemannian network on Kendall shape space for Skeleton based
Action Recognition [7.183483982542308]
We propose a geometry aware deep learning approach for skeleton-based action recognition.
Skeletons are first modeled as trajectories on Kendall's shape space and then mapped to the linear tangent space.
The resulting structured data are then fed to a deep learning architecture, which includes a layer that optimize over rigid and non rigid transformations.
arXiv Detail & Related papers (2020-11-24T10:14:07Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Gradients as Features for Deep Representation Learning [26.996104074384263]
We address the problem of deep representation learning--the efficient adaption of a pre-trained deep network to different tasks.
Our key innovation is the design of a linear model that incorporates both gradient and activation of the pre-trained network.
We present an efficient algorithm for the training and inference of our model without computing the actual gradient.
arXiv Detail & Related papers (2020-04-12T02:57:28Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.