Related papers: Solution space and storage capacity of fully connected two-layer neural networks with generic activation functions

Solution space and storage capacity of fully connected two-layer neural networks with generic activation functions

URL: http://arxiv.org/abs/2404.13404v2
Date: Fri, 29 Nov 2024 09:39:27 GMT
Title: Solution space and storage capacity of fully connected two-layer neural networks with generic activation functions
Authors: Sota Nishiyama, Masayuki Ohzeki,
Abstract summary: storage capacity of a binary classification model is the maximum number of random input-output pairs per parameter that the model can learn.<n>We analyze the structure of the solution space and the storage capacity of fully connected two-layer neural networks with general activation functions.
Score: 0.552480439325792
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The storage capacity of a binary classification model is the maximum number of random input-output pairs per parameter that the model can learn. It is one of the indicators of the expressive power of machine learning models and is important for comparing the performance of various models. In this study, we analyze the structure of the solution space and the storage capacity of fully connected two-layer neural networks with general activation functions using the replica method from statistical physics. Our results demonstrate that the storage capacity per parameter remains finite even with infinite width and that the weights of the network exhibit negative correlations, leading to a 'division of labor'. In addition, we find that increasing the dataset size triggers a phase transition at a certain transition point where the permutation symmetry of weights is broken, resulting in the solution space splitting into disjoint regions. We identify the dependence of this transition point and the storage capacity on the choice of activation function. These findings contribute to understanding the influence of activation functions and the number of parameters on the structure of the solution space, potentially offering insights for selecting appropriate architectures based on specific objectives.

Related papers

Effects of Feature Correlations on Associative Memory Capacity [1.024113475677323]
We develop an empirical framework to analyze the effects of data structure on capacity dynamics.<n>Experiments confirm that memory capacity scales exponentially with increasing separation in the input space.<n>Our findings bridge theoretical work and practical settings for DAM, and might inspire more data-centric methods.
arXiv Detail & Related papers (2025-08-02T15:03:01Z)
Physics-Guided Dual Implicit Neural Representations for Source Separation [70.38762322922211]
We develop a self-supervised machine-learning approach for source separation using a dual implicit neural representation framework.<n>Our method learns directly from the raw data by minimizing a reconstruction-based loss function.<n>Our method offers a versatile framework for addressing source separation problems across diverse domains.
arXiv Detail & Related papers (2025-07-07T17:56:31Z)
On the role of non-linear latent features in bipartite generative neural networks [4.499833362998488]
We investigate Restricted Boltzmann Machines (RBMs) as a function of the prior distribution imposed on their hidden units.<n>Our analysis reveals that standard RBMs with binary hidden nodes and extensive connectivity suffer from reduced critical capacity.<n>To address this, we examine several modifications, such as introducing local biases and adopting richer hidden unit priors.
arXiv Detail & Related papers (2025-06-12T10:20:20Z)
Quantum Convolutional Neural Network with Flexible Stride [7.362858964229726]
We propose a novel quantum convolutional neural network algorithm. It can flexibly adjust the stride to accommodate different tasks. It can achieve exponential acceleration of data scale in less memory compared with its classical counterpart.
arXiv Detail & Related papers (2024-12-01T02:37:06Z)
Heterogeneous quantization regularizes spiking neural network activity [0.0]
We present a data-blind neuromorphic signal conditioning strategy whereby analog data are normalized and quantized into spike phase representations. We extend this mechanism by adding a data-aware calibration step whereby the range and density of the quantization weights adapt to accumulated input statistics.
arXiv Detail & Related papers (2024-09-27T02:25:44Z)
Multilayer Multiset Neuronal Networks -- MMNNs [55.2480439325792]
The present work describes multilayer multiset neuronal networks incorporating two or more layers of coincidence similarity neurons. The work also explores the utilization of counter-prototype points, which are assigned to the image regions to be avoided.
arXiv Detail & Related papers (2023-08-28T12:55:13Z)
ENN: A Neural Network with DCT Adaptive Activation Functions [2.2713084727838115]
We present Expressive Neural Network (ENN), a novel model in which the non-linear activation functions are modeled using the Discrete Cosine Transform (DCT) This parametrization keeps the number of trainable parameters low, is appropriate for gradient-based schemes, and adapts to different learning tasks. The performance of ENN outperforms state of the art benchmarks, providing above a 40% gap in accuracy in some scenarios.
arXiv Detail & Related papers (2023-07-02T21:46:30Z)
Expand-and-Cluster: Parameter Recovery of Neural Networks [9.497862562614666]
We show that the incoming weight vector of each neuron is identifiable up to sign or scaling, depending on the activation function. Our novel method 'Expand-and-Cluster' can identify weights of a target network for all commonly used activation functions.
arXiv Detail & Related papers (2023-04-25T13:14:20Z)
Capacity Studies for a Differential Growing Neural Gas [0.0]
This study evaluates the capacity of a two layered DGNG grid cell model on the Fashion-MNIST dataset. It is concluded that the DGNG model is able to obtain a meaningful and plausible representation of the input space.
arXiv Detail & Related papers (2022-12-23T13:19:48Z)
A Solvable Model of Neural Scaling Laws [72.8349503901712]
Large language models with a huge number of parameters, when trained on near internet-sized number of tokens, have been empirically shown to obey neural scaling laws. We propose a statistical model -- a joint generative data model and random feature model -- that captures this neural scaling phenomenology. Key findings are the manner in which the power laws that occur in the statistics of natural datasets are extended by nonlinear random feature maps.
arXiv Detail & Related papers (2022-10-30T15:13:18Z)
CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning. The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery. The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z)
Deep recurrent networks predicting the gap evolution in adiabatic quantum computing [0.0]
We explore the potential of deep learning for discovering a mapping from the parameters that fully identify a problem Hamiltonian to the parametric dependence of the gap. We show that a long short-term memory network succeeds in predicting the gap when the parameter space scales linearly with system size. Remarkably, we show that once this architecture is combined with a convolutional neural network to deal with the spatial structure of the model, the gap evolution can even be predicted for system sizes larger than the ones seen by the neural network during training.
arXiv Detail & Related papers (2021-09-17T12:08:57Z)
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference. Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
Spatially Adaptive Inference with Stochastic Feature Sampling and Interpolation [72.40827239394565]
We propose to compute features only at sparsely sampled locations. We then densely reconstruct the feature map with an efficient procedure. The presented network is experimentally shown to save substantial computation while maintaining accuracy over a variety of computer vision tasks.
arXiv Detail & Related papers (2020-03-19T15:36:31Z)
Supervised Learning for Non-Sequential Data: A Canonical Polyadic Decomposition Approach [85.12934750565971]
Efficient modelling of feature interactions underpins supervised learning for non-sequential tasks. To alleviate this issue, it has been proposed to implicitly represent the model parameters as a tensor. For enhanced expressiveness, we generalize the framework to allow feature mapping to arbitrarily high-dimensional feature vectors.
arXiv Detail & Related papers (2020-01-27T22:38:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.