Neural networks behave as hash encoders: An empirical study
- URL: http://arxiv.org/abs/2101.05490v1
- Date: Thu, 14 Jan 2021 07:50:40 GMT
- Title: Neural networks behave as hash encoders: An empirical study
- Authors: Fengxiang He, Shiye Lei, Jianmin Ji, Dacheng Tao
- Abstract summary: The input space of a neural network with ReLU-like activations is partitioned into multiple linear regions.
We demonstrate that this partition exhibits the following encoding properties across a variety of deep learning models.
Simple algorithms, such as $K$-Means, $K$-NN, and logistic regression, can achieve fairly good performance on both training and test data.
- Score: 79.38436088982283
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The input space of a neural network with ReLU-like activations is partitioned
into multiple linear regions, each corresponding to a specific activation
pattern of the included ReLU-like activations. We demonstrate that this
partition exhibits the following encoding properties across a variety of deep
learning models: (1) {\it determinism}: almost every linear region contains at
most one training example. We can therefore represent almost every training
example by a unique activation pattern, which is parameterized by a {\it neural
code}; and (2) {\it categorization}: according to the neural code, simple
algorithms, such as $K$-Means, $K$-NN, and logistic regression, can achieve
fairly good performance on both training and test data. These encoding
properties surprisingly suggest that {\it normal neural networks well-trained
for classification behave as hash encoders without any extra efforts.} In
addition, the encoding properties exhibit variability in different scenarios.
{Further experiments demonstrate that {\it model size}, {\it training time},
{\it training sample size}, {\it regularization}, and {\it label noise}
contribute in shaping the encoding properties, while the impacts of the first
three are dominant.} We then define an {\it activation hash phase chart} to
represent the space expanded by {model size}, training time, training sample
size, and the encoding properties, which is divided into three canonical
regions: {\it under-expressive regime}, {\it critically-expressive regime}, and
{\it sufficiently-expressive regime}. The source code package is available at
\url{https://github.com/LeavesLei/activation-code}.
Related papers
- Codebook Features: Sparse and Discrete Interpretability for Neural
Networks [43.06828312515959]
We explore whether we can train neural networks to have hidden states that are sparse, discrete, and more interpretable.
Codebook features are produced by finetuning neural networks with vector quantization bottlenecks at each layer.
We find that neural networks can operate under this extreme bottleneck with only modest degradation in performance.
arXiv Detail & Related papers (2023-10-26T08:28:48Z) - Distributive Pre-Training of Generative Modeling Using Matrix-Product
States [0.0]
We consider an alternative training scheme utilizing basic tensor network operations, e.g., summation and compression.
The training algorithm is based on compressing the superposition state constructed from all the training data in product state representation.
We benchmark the algorithm on the MNIST dataset and show reasonable results for generating new images and classification tasks.
arXiv Detail & Related papers (2023-06-26T15:46:08Z) - A Self-Encoder for Learning Nearest Neighbors [5.297261090056809]
The self-encoder learns to distribute the data samples in the embedding space so that they are linearly separable from one another.
Unlike regular nearest neighbors, the predictions resulting from this encoding of data are invariant to any scaling of features.
arXiv Detail & Related papers (2023-06-25T14:30:31Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Learning Smooth Neural Functions via Lipschitz Regularization [92.42667575719048]
We introduce a novel regularization designed to encourage smooth latent spaces in neural fields.
Compared with prior Lipschitz regularized networks, ours is computationally fast and can be implemented in four lines of code.
arXiv Detail & Related papers (2022-02-16T21:24:54Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z) - AL2: Progressive Activation Loss for Learning General Representations in
Classification Neural Networks [12.14537824884951]
We propose a novel regularization method that progressively penalizes the magnitude of activations during training.
Our method's effect on generalization is analyzed with label randomization tests and cumulative ablations.
arXiv Detail & Related papers (2020-03-07T18:38:46Z) - Learning to Hash with Graph Neural Networks for Recommender Systems [103.82479899868191]
Graph representation learning has attracted much attention in supporting high quality candidate search at scale.
Despite its effectiveness in learning embedding vectors for objects in the user-item interaction network, the computational costs to infer users' preferences in continuous embedding space are tremendous.
We propose a simple yet effective discrete representation learning framework to jointly learn continuous and discrete codes.
arXiv Detail & Related papers (2020-03-04T06:59:56Z) - Prediction of wall-bounded turbulence from wall quantities using
convolutional neural networks [0.0]
A fully-convolutional neural-network model is used to predict the streamwise velocity fields at several wall-normal locations.
Various networks are trained for predictions at three inner-scaled locations.
arXiv Detail & Related papers (2019-12-30T15:34:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.