Related papers: Uncovering the Spectral Bias in Diagonal State Space Models

Uncovering the Spectral Bias in Diagonal State Space Models

URL: http://arxiv.org/abs/2508.20441v1
Date: Thu, 28 Aug 2025 05:39:04 GMT
Title: Uncovering the Spectral Bias in Diagonal State Space Models
Authors: Ruben Solozabal, Velibor Bojkovic, Hilal AlQuabeh, Kentaro Inui, Martin Takáč,
Abstract summary: diagonal alternatives have shown to reach a similar level of performance while being significantly more efficient due to the simplification in the kernel.<n>Our work seeks to systematically understand how to parameterize these models and uncover the learning biases inherent in such diagonal state-space models.
Score: 25.245607786730954
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Current methods for initializing state space models (SSMs) parameters mainly rely on the \textit{HiPPO framework}, which is based on an online approximation of orthogonal polynomials. Recently, diagonal alternatives have shown to reach a similar level of performance while being significantly more efficient due to the simplification in the kernel computation. However, the \textit{HiPPO framework} does not explicitly study the role of its diagonal variants. In this paper, we take a further step to investigate the role of diagonal SSM initialization schemes from the frequency perspective. Our work seeks to systematically understand how to parameterize these models and uncover the learning biases inherent in such diagonal state-space models. Based on our observations, we propose a diagonal initialization on the discrete Fourier domain \textit{S4D-DFouT}. The insights in the role of pole placing in the initialization enable us to further scale them and achieve state-of-the-art results on the Long Range Arena benchmark, allowing us to train from scratch on very large datasets as PathX-256.

Related papers

Inertial Quadratic Majorization Minimization with Application to Kernel Regularized Learning [1.0282274843007797]
We introduce the Quadratic Majorization Minimization with Extrapolation (QMME) framework and establish its sequential convergence properties.<n>To demonstrate practical advantages, we apply QMME to large-scale kernel regularized learning problems.
arXiv Detail & Related papers (2025-07-06T05:17:28Z)
Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling [50.994194925685434]
LrcSSM is a $textitnon-linear$ recurrent model that processes long sequences as fast as today's linear state-space layers.<n>By forcing its Jacobian matrix to be diagonal, the full sequence can be solved in parallel.<n>LrcSSM offers a formal gradient-stability guarantee that other input-varying systems such as Liquid-S4 do not provide.
arXiv Detail & Related papers (2025-05-27T20:02:59Z)
Generalized Tensor-based Parameter-Efficient Fine-Tuning via Lie Group Transformations [50.010924231754856]
Adapting pre-trained foundation models for diverse downstream tasks is a core practice in artificial intelligence.<n>To overcome this, parameter-efficient fine-tuning (PEFT) methods like LoRA have emerged and are becoming a growing research focus.<n>We propose a generalization that extends matrix-based PEFT methods to higher-dimensional parameter spaces without compromising their structural properties.
arXiv Detail & Related papers (2025-04-01T14:36:45Z)
Learning Bijective Surface Parameterization for Inferring Signed Distance Functions from Sparse Point Clouds with Grid Deformation [50.26314343851213]
Inferring signed distance functions (SDFs) from sparse point clouds remains a challenge in surface reconstruction.<n>We present a novel approach that learns a dynamic deformation network to predict SDFs in an end-to-end manner.<n> Experimental results on synthetic and real scanned datasets demonstrate that our method significantly outperforms the current state-of-the-art methods.
arXiv Detail & Related papers (2025-03-31T02:27:02Z)
$\ell_0$-Regularized Quadratic Surface Support Vector Machines [0.0]
Kernel-free quadratic surface support vector machines have recently gained traction due to their flexibility in modeling nonlinear decision boundaries without relying on kernel functions.<n>We propose a sparse variant of the QSVM by enforcing a cardinality constraint on the model parameters.<n>We validate our approach on several real-world datasets, demonstrating its ability to reduce overfitting while maintaining strong classification performance.
arXiv Detail & Related papers (2025-01-20T04:26:34Z)
GP-FL: Model-Based Hessian Estimation for Second-Order Over-the-Air Federated Learning [52.295563400314094]
Second-order methods are widely adopted to improve the convergence rate of learning algorithms.<n>This paper introduces a novel second-order FL framework tailored for wireless channels.
arXiv Detail & Related papers (2024-12-05T04:27:41Z)
Inference in Randomized Least Squares and PCA via Normality of Quadratic Forms [19.616162116973637]
We develop a unified methodology for statistical inference via randomized sketching or projections. The methodology applies to fixed datasets -- i.e., is data-conditional -- and the only randomness is due to the randomized algorithm.
arXiv Detail & Related papers (2024-04-01T04:35:44Z)
Generative Modeling with Phase Stochastic Bridges [49.4474628881673]
Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. We introduce a novel generative modeling framework grounded in textbfphase space dynamics Our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.
arXiv Detail & Related papers (2023-10-11T18:38:28Z)
Robustifying State-space Models for Long Sequences via Approximate Diagonalization [47.321212977509454]
State-space models (SSMs) have emerged as a framework for learning long-range sequence tasks. diagonalizing the HiPPO framework is itself an ill-posed problem. We introduce a generic, backward-stable "perturb-then-diagonalize" (PTD) methodology.
arXiv Detail & Related papers (2023-10-02T23:36:13Z)
On the Parameterization and Initialization of Diagonal State Space Models [35.68370606343843]
We show how to parameterize and initialize diagonal state space models. We show that the diagonal restriction of S4's matrix surprisingly recovers the same kernel in the limit of infinite state dimension.
arXiv Detail & Related papers (2022-06-23T17:58:39Z)
Kernel-based estimation for partially functional linear model: Minimax rates and randomized sketches [12.799283644502882]
This paper considers the partially functional linear model (PFLM) where all predictive features consist of a functional covariate and a high dimensional scalar vector. Over an infinite dimensional reproducing kernel Hilbert space, the proposed estimation for PFLM is a least square approach with two mixed regularizations of a function-norm and an $ell_$-norm.
arXiv Detail & Related papers (2021-10-18T06:27:59Z)
Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.