Related papers: Spectral Regularization: an Inductive Bias for Sequence Modeling

Spectral Regularization: an Inductive Bias for Sequence Modeling

URL: http://arxiv.org/abs/2211.02255v1
Date: Fri, 4 Nov 2022 04:07:05 GMT
Title: Spectral Regularization: an Inductive Bias for Sequence Modeling
Authors: Kaiwen Hou and Guillaume Rabusseau
Abstract summary: This paper presents a spectral regularization technique, which attaches a unique inductive bias to sequence modeling. From fundamental connections between Hankel matrices and regular grammars, we propose to use the trace norm of the Hankel matrix, the tightest convex relaxation of its rank, as the spectral regularizer.
Score: 7.365884062005811
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Various forms of regularization in learning tasks strive for different notions of simplicity. This paper presents a spectral regularization technique, which attaches a unique inductive bias to sequence modeling based on an intuitive concept of simplicity defined in the Chomsky hierarchy. From fundamental connections between Hankel matrices and regular grammars, we propose to use the trace norm of the Hankel matrix, the tightest convex relaxation of its rank, as the spectral regularizer. To cope with the fact that the Hankel matrix is bi-infinite, we propose an unbiased stochastic estimator for its trace norm. Ultimately, we demonstrate experimental results on Tomita grammars, which exhibit the potential benefits of spectral regularization and validate the proposed stochastic estimator.

Related papers

On the Upper Bounds for the Matrix Spectral Norm [4.773232329447336]
We propose a new Counterbalance estimator that provides upper bounds on the norm and derive probabilistic guarantees on its underestimation.<n>Compared to standard approaches such as the power method, the proposed estimator produces significantly tighter upper bounds in both synthetic and real-world settings.
arXiv Detail & Related papers (2025-06-18T17:39:03Z)
Generalization for Least Squares Regression With Simple Spiked Covariances [3.9134031118910264]
The generalization properties of even two-layer neural networks trained by gradient descent remain poorly understood. Recent work has made progress by describing the spectrum of the feature matrix at the hidden layer. Yet, the generalization error for linear models with spiked covariances has not been previously determined.
arXiv Detail & Related papers (2024-10-17T19:46:51Z)
Entrywise error bounds for low-rank approximations of kernel matrices [55.524284152242096]
We derive entrywise error bounds for low-rank approximations of kernel matrices obtained using the truncated eigen-decomposition. A key technical innovation is a delocalisation result for the eigenvectors of the kernel matrix corresponding to small eigenvalues. We validate our theory with an empirical study of a collection of synthetic and real-world datasets.
arXiv Detail & Related papers (2024-05-23T12:26:25Z)
Improving Expressive Power of Spectral Graph Neural Networks with Eigenvalue Correction [55.57072563835959]
spectral graph neural networks are characterized by filters. We propose an eigenvalue correction strategy that can free filters from the constraints of repeated eigenvalue inputs.
arXiv Detail & Related papers (2024-01-28T08:12:00Z)
The Inductive Bias of Flatness Regularization for Deep Matrix Factorization [58.851514333119255]
This work takes the first step toward understanding the inductive bias of the minimum trace of the Hessian solutions in deep linear networks. We show that for all depth greater than one, with the standard Isometry Property (RIP) on the measurements, minimizing the trace of Hessian is approximately equivalent to minimizing the Schatten 1-norm of the corresponding end-to-end matrix parameters.
arXiv Detail & Related papers (2023-06-22T23:14:57Z)
Heavy-Tailed Regularization of Weight Matrices in Deep Neural Networks [8.30897399932868]
Key finding indicates that the generalization performance of a neural network is associated with the degree of heavy tails in the spectrum of its weight matrices. We introduce a novel regularization technique, termed Heavy-Tailed Regularization, which explicitly promotes a more heavy-tailed spectrum in the weight matrix through regularization. We empirically show that heavytailed regularization outperforms conventional regularization techniques in terms of generalization performance.
arXiv Detail & Related papers (2023-04-06T07:50:14Z)
Regularized Vector Quantization for Tokenized Image Synthesis [126.96880843754066]
Quantizing images into discrete representations has been a fundamental problem in unified generative modeling. deterministic quantization suffers from severe codebook collapse and misalignment with inference stage while quantization suffers from low codebook utilization and reconstruction objective. This paper presents a regularized vector quantization framework that allows to mitigate perturbed above issues effectively by applying regularization from two perspectives.
arXiv Detail & Related papers (2023-03-11T15:20:54Z)
Third quantization of open quantum systems: new dissipative symmetries and connections to phase-space and Keldysh field theory formulations [77.34726150561087]
We reformulate the technique of third quantization in a way that explicitly connects all three methods. We first show that our formulation reveals a fundamental dissipative symmetry present in all quadratic bosonic or fermionic Lindbladians. For bosons, we then show that the Wigner function and the characteristic function can be thought of as ''wavefunctions'' of the density matrix.
arXiv Detail & Related papers (2023-02-27T18:56:40Z)
Generalizing and Improving Jacobian and Hessian Regularization [1.926971915834451]
We generalize previous efforts by extending the target matrix from zero to any matrix that admits efficient matrix-vector products. The proposed paradigm allows us to construct novel regularization terms that enforce symmetry or diagonality on square Jacobian and Hessian matrices. We introduce Lanczos-based spectral norm minimization to tackle this difficulty.
arXiv Detail & Related papers (2022-12-01T07:01:59Z)
Quantitative deterministic equivalent of sample covariance matrices with a general dependence structure [0.0]
We prove quantitative bounds involving both the dimensions and the spectral parameter, in particular allowing it to get closer to the real positive semi-line. As applications, we obtain a new bound for the convergence in Kolmogorov distance of the empirical spectral distributions of these general models.
arXiv Detail & Related papers (2022-11-23T15:50:31Z)
Understanding Implicit Regularization in Over-Parameterized Single Index Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model. We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z)
The semiring of dichotomies and asymptotic relative submajorization [0.0]
We study quantum dichotomies and the resource theory of asymmetric distinguishability using a generalization of Strassen's theorem on preordered semirings. We find that an variant of relative submajorization, defined on unnormalized dichotomies, is characterized by real-valued monotones that are multiplicative under the tensor product and additive under the direct sum.
arXiv Detail & Related papers (2020-04-22T14:13:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.