A Novel Explanation Against Linear Neural Networks
- URL: http://arxiv.org/abs/2401.00186v1
- Date: Sat, 30 Dec 2023 09:44:51 GMT
- Title: A Novel Explanation Against Linear Neural Networks
- Authors: Anish Lakkapragada
- Abstract summary: Linear Regression and neural networks are widely used to model data.
We show that neural networks without activation functions, or linear neural networks, actually reduce both training and testing performance.
We prove this hypothesis through an analysis of the optimization of an LNN and rigorous testing comparing the performance between both LNNs and linear regression on noisy datasets.
- Score: 1.223779595809275
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Linear Regression and neural networks are widely used to model data. Neural
networks distinguish themselves from linear regression with their use of
activation functions that enable modeling nonlinear functions. The standard
argument for these activation functions is that without them, neural networks
only can model a line. However, a novel explanation we propose in this paper
for the impracticality of neural networks without activation functions, or
linear neural networks, is that they actually reduce both training and testing
performance. Having more parameters makes LNNs harder to optimize, and thus
they require more training iterations than linear regression to even
potentially converge to the optimal solution. We prove this hypothesis through
an analysis of the optimization of an LNN and rigorous testing comparing the
performance between both LNNs and linear regression on synthethic, noisy
datasets.
Related papers
- Linearization of ReLU Activation Function for Neural Network-Embedded
Optimization:Optimal Day-Ahead Energy Scheduling [0.2900810893770134]
In some applications such as battery degradation neural network-based microgrid day-ahead energy scheduling, the input features of the trained learning model are variables to be solved in optimization models.
The use of nonlinear activation functions in the neural network will make such problems extremely hard to solve if not unsolvable.
This paper investigated different methods for linearizing the nonlinear activation functions with a particular focus on the widely used rectified linear unit (ReLU) function.
arXiv Detail & Related papers (2023-10-03T02:47:38Z) - Using Linear Regression for Iteratively Training Neural Networks [4.873362301533824]
We present a simple linear regression based approach for learning the weights and biases of a neural network.
The approach is intended to be to larger, more complex architectures.
arXiv Detail & Related papers (2023-07-11T11:53:25Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable Robustness [172.61581010141978]
Certifiable robustness is a desirable property for adopting deep neural networks (DNNs) in safety-critical scenarios.
We propose a novel solution to strategically manipulate neurons, by "grafting" appropriate levels of linearity.
arXiv Detail & Related papers (2022-06-15T22:42:29Z) - Reverse engineering recurrent neural networks with Jacobian switching
linear dynamical systems [24.0378100479104]
Recurrent neural networks (RNNs) are powerful models for processing time-series data.
The framework of reverse engineering a trained RNN by linearizing around its fixed points has provided insight, but the approach has significant challenges.
We present a new model that overcomes these limitations by co-training an RNN with a novel switching linear dynamical system (SLDS) formulation.
arXiv Detail & Related papers (2021-11-01T20:49:30Z) - Going Beyond Linear RL: Sample Efficient Neural Function Approximation [76.57464214864756]
We study function approximation with two-layer neural networks.
Our results significantly improve upon what can be attained with linear (or eluder dimension) methods.
arXiv Detail & Related papers (2021-07-14T03:03:56Z) - LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning.
LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z) - How Neural Networks Extrapolate: From Feedforward to Graph Neural
Networks [80.55378250013496]
We study how neural networks trained by gradient descent extrapolate what they learn outside the support of the training distribution.
Graph Neural Networks (GNNs) have shown some success in more complex tasks.
arXiv Detail & Related papers (2020-09-24T17:48:59Z) - Nonlinear computations in spiking neural networks through multiplicative
synapses [3.1498833540989413]
nonlinear computations can be implemented successfully in spiking neural networks.
This requires supervised training and the resulting connectivity can be hard to interpret.
We show how to directly derive the required connectivity for several nonlinear dynamical systems.
arXiv Detail & Related papers (2020-09-08T16:47:27Z) - The Power of Linear Recurrent Neural Networks [1.124958340749622]
We show how autoregressive linear, i.e., linearly activated recurrent neural networks (LRNNs) can approximate any time-dependent function f(t)
LRNNs outperform the previous state-of-the-art for the MSO task with a minimal number of units.
arXiv Detail & Related papers (2018-02-09T15:35:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.