Implicit Regularization via Spectral Neural Networks and Non-linear
Matrix Sensing
- URL: http://arxiv.org/abs/2402.17595v1
- Date: Tue, 27 Feb 2024 15:28:01 GMT
- Title: Implicit Regularization via Spectral Neural Networks and Non-linear
Matrix Sensing
- Authors: Hong T.M. Chu, Subhro Ghosh, Chi Thanh Lam, Soumendu Sundar Mukherjee
- Abstract summary: Spectral Neural Networks (abbrv. SNN) is particularly suitable for matrix learning problems.
We show that the SNN architecture is inherently much more amenable to theoretical analysis than vanilla neural nets.
We believe that the SNN architecture has the potential to be of wide applicability in a broad class of matrix learning scenarios.
- Score: 2.171120568435925
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The phenomenon of implicit regularization has attracted interest in recent
years as a fundamental aspect of the remarkable generalizing ability of neural
networks. In a nutshell, it entails that gradient descent dynamics in many
neural nets, even without any explicit regularizer in the loss function,
converges to the solution of a regularized learning problem. However, known
results attempting to theoretically explain this phenomenon focus
overwhelmingly on the setting of linear neural nets, and the simplicity of the
linear structure is particularly crucial to existing arguments. In this paper,
we explore this problem in the context of more realistic neural networks with a
general class of non-linear activation functions, and rigorously demonstrate
the implicit regularization phenomenon for such networks in the setting of
matrix sensing problems, together with rigorous rate guarantees that ensure
exponentially fast convergence of gradient descent.In this vein, we contribute
a network architecture called Spectral Neural Networks (abbrv. SNN) that is
particularly suitable for matrix learning problems. Conceptually, this entails
coordinatizing the space of matrices by their singular values and singular
vectors, as opposed to by their entries, a potentially fruitful perspective for
matrix learning. We demonstrate that the SNN architecture is inherently much
more amenable to theoretical analysis than vanilla neural nets and confirm its
effectiveness in the context of matrix sensing, via both mathematical
guarantees and empirical investigations. We believe that the SNN architecture
has the potential to be of wide applicability in a broad class of matrix
learning scenarios.
Related papers
- Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks [5.851101657703105]
We take a first step towards theoretically characterizing the conditioning of the Gauss-Newton (GN) matrix in neural networks.
We establish tight bounds on the condition number of the GN in deep linear networks of arbitrary depth and width.
We expand the analysis to further architectural components, such as residual connections and convolutional layers.
arXiv Detail & Related papers (2024-11-04T14:56:48Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Convergence Analysis for Learning Orthonormal Deep Linear Neural
Networks [27.29463801531576]
We provide convergence analysis for training orthonormal deep linear neural networks.
Our results shed light on how increasing the number of hidden layers can impact the convergence speed.
arXiv Detail & Related papers (2023-11-24T18:46:54Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Implicit Regularization in Hierarchical Tensor Factorization and Deep
Convolutional Neural Networks [18.377136391055327]
This paper theoretically analyzes the implicit regularization in hierarchical tensor factorization.
It translates to an implicit regularization towards locality for the associated convolutional networks.
Our work highlights the potential of enhancing neural networks via theoretical analysis of their implicit regularization.
arXiv Detail & Related papers (2022-01-27T18:48:30Z) - Convergence Analysis and Implicit Regularization of Feedback Alignment
for Deep Linear Networks [27.614609336582568]
We theoretically analyze the Feedback Alignment (FA) algorithm, an efficient alternative to backpropagation for training neural networks.
We provide convergence guarantees with rates for deep linear networks for both continuous and discrete dynamics.
arXiv Detail & Related papers (2021-10-20T22:57:03Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Connecting Weighted Automata, Tensor Networks and Recurrent Neural
Networks through Spectral Learning [58.14930566993063]
We present connections between three models used in different research fields: weighted finite automata(WFA) from formal languages and linguistics, recurrent neural networks used in machine learning, and tensor networks.
We introduce the first provable learning algorithm for linear 2-RNN defined over sequences of continuous vectors input.
arXiv Detail & Related papers (2020-10-19T15:28:00Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - The Surprising Simplicity of the Early-Time Learning Dynamics of Neural
Networks [43.860358308049044]
In work, we show that these common perceptions can be completely false in the early phase of learning.
We argue that this surprising simplicity can persist in networks with more layers with convolutional architecture.
arXiv Detail & Related papers (2020-06-25T17:42:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.