Related papers: Neural networks behave as hash encoders: An empirical study

Neural networks behave as hash encoders: An empirical study

URL: http://arxiv.org/abs/2101.05490v1
Date: Thu, 14 Jan 2021 07:50:40 GMT
Title: Neural networks behave as hash encoders: An empirical study
Authors: Fengxiang He, Shiye Lei, Jianmin Ji, Dacheng Tao
Abstract summary: The input space of a neural network with ReLU-like activations is partitioned into multiple linear regions. We demonstrate that this partition exhibits the following encoding properties across a variety of deep learning models. Simple algorithms, such as $K$-Means, $K$-NN, and logistic regression, can achieve fairly good performance on both training and test data.
Score: 79.38436088982283
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The input space of a neural network with ReLU-like activations is partitioned into multiple linear regions, each corresponding to a specific activation pattern of the included ReLU-like activations. We demonstrate that this partition exhibits the following encoding properties across a variety of deep learning models: (1) {\it determinism}: almost every linear region contains at most one training example. We can therefore represent almost every training example by a unique activation pattern, which is parameterized by a {\it neural code}; and (2) {\it categorization}: according to the neural code, simple algorithms, such as $K$-Means, $K$-NN, and logistic regression, can achieve fairly good performance on both training and test data. These encoding properties surprisingly suggest that {\it normal neural networks well-trained for classification behave as hash encoders without any extra efforts.} In addition, the encoding properties exhibit variability in different scenarios. {Further experiments demonstrate that {\it model size}, {\it training time}, {\it training sample size}, {\it regularization}, and {\it label noise} contribute in shaping the encoding properties, while the impacts of the first three are dominant.} We then define an {\it activation hash phase chart} to represent the space expanded by {model size}, training time, training sample size, and the encoding properties, which is divided into three canonical regions: {\it under-expressive regime}, {\it critically-expressive regime}, and {\it sufficiently-expressive regime}. The source code package is available at \url{https://github.com/LeavesLei/activation-code}.

Related papers

Exact Learning of Permutations for Nonzero Binary Inputs with Logarithmic Training Size and Quadratic Ensemble Complexity [5.3800094588915375]
This paper focuses on two-layer fully connected feed-forward neural networks and the task of learning permutations on nonzero binary inputs. We show that in the infinite width Neural Tangent Kernel (NTK) regime, an ensemble of such networks independently trained with gradient descent on only the $k$ standard basis vectors out of $2k - 1$ possible inputs successfully learns any fixed permutation of length $k$ with arbitrarily high probability.
arXiv Detail & Related papers (2025-02-24T00:50:02Z)
Batch Normalization Decomposed [21.226713936233423]
A neural network layer with batch normalization comprises three components that affect the representation induced by the network. In our work, we present an analysis of the other two key components of networks with batch normalization, namely, the recentering and the non-linearity.
arXiv Detail & Related papers (2024-12-03T21:18:27Z)
Codebook Features: Sparse and Discrete Interpretability for Neural Networks [43.06828312515959]
We explore whether we can train neural networks to have hidden states that are sparse, discrete, and more interpretable. Codebook features are produced by finetuning neural networks with vector quantization bottlenecks at each layer. We find that neural networks can operate under this extreme bottleneck with only modest degradation in performance.
arXiv Detail & Related papers (2023-10-26T08:28:48Z)
Distributive Pre-Training of Generative Modeling Using Matrix-Product States [0.0]
We consider an alternative training scheme utilizing basic tensor network operations, e.g., summation and compression. The training algorithm is based on compressing the superposition state constructed from all the training data in product state representation. We benchmark the algorithm on the MNIST dataset and show reasonable results for generating new images and classification tasks.
arXiv Detail & Related papers (2023-06-26T15:46:08Z)
A Self-Encoder for Learning Nearest Neighbors [5.297261090056809]
The self-encoder learns to distribute the data samples in the embedding space so that they are linearly separable from one another. Unlike regular nearest neighbors, the predictions resulting from this encoding of data are invariant to any scaling of features.
arXiv Detail & Related papers (2023-06-25T14:30:31Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Learning Smooth Neural Functions via Lipschitz Regularization [92.42667575719048]
We introduce a novel regularization designed to encourage smooth latent spaces in neural fields. Compared with prior Lipschitz regularized networks, ours is computationally fast and can be implemented in four lines of code.
arXiv Detail & Related papers (2022-02-16T21:24:54Z)
Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z)
Towards Understanding Hierarchical Learning: Benefits of Neural Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks. We show that neural representation can achieve improved sample complexities compared with the raw input. Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z)
AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks [12.14537824884951]
We propose a novel regularization method that progressively penalizes the magnitude of activations during training. Our method's effect on generalization is analyzed with label randomization tests and cumulative ablations.
arXiv Detail & Related papers (2020-03-07T18:38:46Z)
Learning to Hash with Graph Neural Networks for Recommender Systems [103.82479899868191]
Graph representation learning has attracted much attention in supporting high quality candidate search at scale. Despite its effectiveness in learning embedding vectors for objects in the user-item interaction network, the computational costs to infer users' preferences in continuous embedding space are tremendous. We propose a simple yet effective discrete representation learning framework to jointly learn continuous and discrete codes.
arXiv Detail & Related papers (2020-03-04T06:59:56Z)
Prediction of wall-bounded turbulence from wall quantities using convolutional neural networks [0.0]
A fully-convolutional neural-network model is used to predict the streamwise velocity fields at several wall-normal locations. Various networks are trained for predictions at three inner-scaled locations.
arXiv Detail & Related papers (2019-12-30T15:34:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.