Related papers: Exploring Linear Feature Disentanglement For Neural Networks

Exploring Linear Feature Disentanglement For Neural Networks

URL: http://arxiv.org/abs/2203.11700v1
Date: Tue, 22 Mar 2022 13:09:17 GMT
Title: Exploring Linear Feature Disentanglement For Neural Networks
Authors: Tiantian He, Zhibin Li, Yongshun Gong, Yazhou Yao, Xiushan Nie, Yilong Yin
Abstract summary: Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs) Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space. This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs.
Score: 63.20827189693117
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Non-linear activation functions, e.g., Sigmoid, ReLU, and Tanh, have achieved great success in neural networks (NNs). Due to the complex non-linear characteristic of samples, the objective of those activation functions is to project samples from their original feature space to a linear separable feature space. This phenomenon ignites our interest in exploring whether all features need to be transformed by all non-linear functions in current typical NNs, i.e., whether there exists a part of features arriving at the linear separable feature space in the intermediate layers, that does not require further non-linear variation but an affine transformation instead. To validate the above hypothesis, we explore the problem of linear feature disentanglement for neural networks in this paper. Specifically, we devise a learnable mask module to distinguish between linear and non-linear features. Through our designed experiments we found that some features reach the linearly separable space earlier than the others and can be detached partly from the NNs. The explored method also provides a readily feasible pruning strategy which barely affects the performance of the original model. We conduct our experiments on four datasets and present promising results.

Related papers

Learning Nonlinearity of Boolean Functions: An Experimentation with Neural Networks [3.4179091429029382]
We train encoder style deep neural networks to learn to predict the nonlinearity of Boolean functions. We show that deep neural networks are able to learn to predict the property for functions in 4 and 5 variables with an accuracy above 95%.
arXiv Detail & Related papers (2025-02-03T05:10:25Z)
Understanding How Nonlinear Layers Create Linearly Separable Features for Low-Dimensional Data [7.0164889385584415]
Deep neural networks have attained remarkable success across diverse classification tasks. Recent empirical studies have shown that deep networks learn features that are linearly separable across classes. This work bridges the gap between empirical observations and theoretical understanding of the separation capacity of nonlinear networks.
arXiv Detail & Related papers (2025-01-04T19:43:21Z)
Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations [54.17275171325324]
We present a counterexample to the Linear Representation Hypothesis (LRH) When trained to repeat an input token sequence, neural networks learn to represent the token at each position with a particular order of magnitude, rather than a direction. These findings strongly indicate that interpretability research should not be confined to the LRH.
arXiv Detail & Related papers (2024-08-20T15:04:37Z)
A Novel Explanation Against Linear Neural Networks [1.223779595809275]
Linear Regression and neural networks are widely used to model data. We show that neural networks without activation functions, or linear neural networks, actually reduce both training and testing performance. We prove this hypothesis through an analysis of the optimization of an LNN and rigorous testing comparing the performance between both LNNs and linear regression on noisy datasets.
arXiv Detail & Related papers (2023-12-30T09:44:51Z)
Function-Space Optimality of Neural Architectures With Multivariate Nonlinearities [30.762063524541638]
We prove a representer theorem that states that the solution sets to learning problems posed over Banach spaces are completely characterized by neural architectures with nonlinearities. Our results shed light on the regularity of functions learned by neural networks trained on data, and provide new theoretical motivation for several architectural choices found in practice.
arXiv Detail & Related papers (2023-10-05T17:13:16Z)
Points of non-linearity of functions generated by random neural networks [0.0]
We consider functions from the real numbers to the real numbers, output by a neural network with 1 hidden activation layer, arbitrary width, and ReLU activation function. We compute the expected distribution of the points of non-linearity.
arXiv Detail & Related papers (2023-04-19T17:40:19Z)
Approximation of Nonlinear Functionals Using Deep ReLU Networks [7.876115370275732]
We investigate the approximation power of functional deep neural networks associated with the rectified linear unit (ReLU) activation function. In addition, we establish rates of approximation of the proposed functional deep ReLU networks under mild regularity conditions.
arXiv Detail & Related papers (2023-04-10T08:10:11Z)
The merged-staircase property: a necessary and nearly sufficient condition for SGD learning of sparse functions on two-layer neural networks [19.899987851661354]
We study SGD-learnability with $O(d)$ sample complexity in a large ambient dimension. Our main results characterize a hierarchical property, the "merged-staircase property", that is both necessary and nearly sufficient for learning in this setting. Key tools are a new "dimension-free" dynamics approximation that applies to functions defined on a latent low-dimensional subspace.
arXiv Detail & Related papers (2022-02-17T13:43:06Z)
Convolutional Filtering and Neural Networks with Non Commutative Algebras [153.20329791008095]
We study the generalization of non commutative convolutional neural networks. We show that non commutative convolutional architectures can be stable to deformations on the space of operators.
arXiv Detail & Related papers (2021-08-23T04:22:58Z)
Going Beyond Linear RL: Sample Efficient Neural Function Approximation [76.57464214864756]
We study function approximation with two-layer neural networks. Our results significantly improve upon what can be attained with linear (or eluder dimension) methods.
arXiv Detail & Related papers (2021-07-14T03:03:56Z)
Provably Efficient Neural Estimation of Structural Equation Model: An Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs) We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent. For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
Measuring Model Complexity of Neural Networks with Curve Activation Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function. We experimentally explore the training process of neural networks and detect overfitting. We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.