Exploring Linear Feature Disentanglement For Neural Networks
- URL: http://arxiv.org/abs/2203.11700v1
- Date: Tue, 22 Mar 2022 13:09:17 GMT
- Title: Exploring Linear Feature Disentanglement For Neural Networks
- Authors: Tiantian He, Zhibin Li, Yongshun Gong, Yazhou Yao, Xiushan Nie, Yilong
Yin
- Abstract summary: Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs)
Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space.
This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
- Score: 63.20827189693117
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved
great success in neural networks (NNs). Due to the complex non-linear
characteristic of samples, the objective of those activation functions is to
project samples from their original feature space to a linear separable feature
space. This phenomenon ignites our interest in exploring whether all features
need to be transformed by all non-linear functions in current typical NNs,
i.e., whether there exists a part of features arriving at the linear separable
feature space in the intermediate layers, that does not require further
non-linear variation but an affine transformation instead. To validate the
above hypothesis, we explore the problem of linear feature disentanglement for
neural networks in this paper. Specifically, we devise a learnable mask module
to distinguish between linear and non-linear features. Through our designed
experiments we found that some features reach the linearly separable space
earlier than the others and can be detached partly from the NNs. The explored
method also provides a readily feasible pruning strategy which barely affects
the performance of the original model. We conduct our experiments on four
datasets and present promising results.
Related papers
- Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations [54.17275171325324]
We present a counterexample to the Linear Representation Hypothesis (LRH)
When trained to repeat an input token sequence, neural networks learn to represent the token at each position with a particular order of magnitude, rather than a direction.
These findings strongly indicate that interpretability research should not be confined to the LRH.
arXiv Detail & Related papers (2024-08-20T15:04:37Z) - A Novel Explanation Against Linear Neural Networks [1.223779595809275]
Linear Regression and neural networks are widely used to model data.
We show that neural networks without activation functions, or linear neural networks, actually reduce both training and testing performance.
We prove this hypothesis through an analysis of the optimization of an LNN and rigorous testing comparing the performance between both LNNs and linear regression on noisy datasets.
arXiv Detail & Related papers (2023-12-30T09:44:51Z) - Function-Space Optimality of Neural Architectures With Multivariate
Nonlinearities [30.762063524541638]
We prove a representer theorem that states that the solution sets to learning problems posed over Banach spaces are completely characterized by neural architectures with nonlinearities.
Our results shed light on the regularity of functions learned by neural networks trained on data, and provide new theoretical motivation for several architectural choices found in practice.
arXiv Detail & Related papers (2023-10-05T17:13:16Z) - Points of non-linearity of functions generated by random neural networks [0.0]
We consider functions from the real numbers to the real numbers, output by a neural network with 1 hidden activation layer, arbitrary width, and ReLU activation function.
We compute the expected distribution of the points of non-linearity.
arXiv Detail & Related papers (2023-04-19T17:40:19Z) - Approximation of Nonlinear Functionals Using Deep ReLU Networks [7.876115370275732]
We investigate the approximation power of functional deep neural networks associated with the rectified linear unit (ReLU) activation function.
In addition, we establish rates of approximation of the proposed functional deep ReLU networks under mild regularity conditions.
arXiv Detail & Related papers (2023-04-10T08:10:11Z) - The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks [19.899987851661354]
We study SGD-learnability with $O(d)$ sample complexity in a large ambient dimension.
Our main results characterize a hierarchical property, the "merged-staircase property", that is both necessary and nearly sufficient for learning in this setting.
Key tools are a new "dimension-free" dynamics approximation that applies to functions defined on a latent low-dimensional subspace.
arXiv Detail & Related papers (2022-02-17T13:43:06Z) - Convolutional Filtering and Neural Networks with Non Commutative
Algebras [153.20329791008095]
We study the generalization of non commutative convolutional neural networks.
We show that non commutative convolutional architectures can be stable to deformations on the space of operators.
arXiv Detail & Related papers (2021-08-23T04:22:58Z) - Going Beyond Linear RL: Sample Efficient Neural Function Approximation [76.57464214864756]
We study function approximation with two-layer neural networks.
Our results significantly improve upon what can be attained with linear (or eluder dimension) methods.
arXiv Detail & Related papers (2021-07-14T03:03:56Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Measuring Model Complexity of Neural Networks with Curve Activation
Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function.
We experimentally explore the training process of neural networks and detect overfitting.
We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.