Implicit Regularization in Hierarchical Tensor Factorization and Deep
Convolutional Neural Networks
- URL: http://arxiv.org/abs/2201.11729v1
- Date: Thu, 27 Jan 2022 18:48:30 GMT
- Title: Implicit Regularization in Hierarchical Tensor Factorization and Deep
Convolutional Neural Networks
- Authors: Noam Razin, Asaf Maman, Nadav Cohen
- Abstract summary: This paper theoretically analyzes the implicit regularization in hierarchical tensor factorization.
It translates to an implicit regularization towards locality for the associated convolutional networks.
Our work highlights the potential of enhancing neural networks via theoretical analysis of their implicit regularization.
- Score: 18.377136391055327
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the pursuit of explaining implicit regularization in deep learning,
prominent focus was given to matrix and tensor factorizations, which correspond
to simplified neural networks. It was shown that these models exhibit implicit
regularization towards low matrix and tensor ranks, respectively. Drawing
closer to practical deep learning, the current paper theoretically analyzes the
implicit regularization in hierarchical tensor factorization, a model
equivalent to certain deep convolutional neural networks. Through a dynamical
systems lens, we overcome challenges associated with hierarchy, and establish
implicit regularization towards low hierarchical tensor rank. This translates
to an implicit regularization towards locality for the associated convolutional
networks. Inspired by our theory, we design explicit regularization
discouraging locality, and demonstrate its ability to improve performance of
modern convolutional networks on non-local tasks, in defiance of conventional
wisdom by which architectural changes are needed. Our work highlights the
potential of enhancing neural networks via theoretical analysis of their
implicit regularization.
Related papers
- Feature Contamination: Neural Networks Learn Uncorrelated Features and Fail to Generalize [5.642322814965062]
Learning representations that generalize under distribution shifts is critical for building robust machine learning models.
We show that even allowing a neural network to explicitly fit the representations obtained from a teacher network that can generalize out-of-distribution is insufficient for the generalization of the student network.
arXiv Detail & Related papers (2024-06-05T15:04:27Z) - Dynamical stability and chaos in artificial neural network trajectories along training [3.379574469735166]
We study the dynamical properties of this process by analyzing through this lens the network trajectories of a shallow neural network.
We find hints of regular and chaotic behavior depending on the learning rate regime.
This work also contributes to the cross-fertilization of ideas between dynamical systems theory, network theory and machine learning.
arXiv Detail & Related papers (2024-04-08T17:33:11Z) - Implicit Regularization via Spectral Neural Networks and Non-linear
Matrix Sensing [2.171120568435925]
Spectral Neural Networks (abbrv. SNN) is particularly suitable for matrix learning problems.
We show that the SNN architecture is inherently much more amenable to theoretical analysis than vanilla neural nets.
We believe that the SNN architecture has the potential to be of wide applicability in a broad class of matrix learning scenarios.
arXiv Detail & Related papers (2024-02-27T15:28:01Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - TANGOS: Regularizing Tabular Neural Networks through Gradient
Orthogonalization and Specialization [69.80141512683254]
We introduce Tabular Neural Gradient Orthogonalization and gradient (TANGOS)
TANGOS is a novel framework for regularization in the tabular setting built on latent unit attributions.
We demonstrate that our approach can lead to improved out-of-sample generalization performance, outperforming other popular regularization methods.
arXiv Detail & Related papers (2023-03-09T18:57:13Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Optimization Theory for ReLU Neural Networks Trained with Normalization
Layers [82.61117235807606]
The success of deep neural networks in part due to the use of normalization layers.
Our analysis shows how the introduction of normalization changes the landscape and can enable faster activation.
arXiv Detail & Related papers (2020-06-11T23:55:54Z) - Input-to-State Representation in linear reservoirs dynamics [15.491286626948881]
Reservoir computing is a popular approach to design recurrent neural networks.
The working principle of these networks is not fully understood.
A novel analysis of the dynamics of such networks is proposed.
arXiv Detail & Related papers (2020-03-24T00:14:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.