Related papers: Tuning Frequency Bias of State Space Models

Tuning Frequency Bias of State Space Models

URL: http://arxiv.org/abs/2410.02035v1
Date: Wed, 2 Oct 2024 21:04:22 GMT
Title: Tuning Frequency Bias of State Space Models
Authors: Annan Yu, Dongwei Lyu, Soon Hoe Lim, Michael W. Mahoney, N. Benjamin Erichson,
Abstract summary: State space models (SSMs) leverage linear, time-invariant (LTI) systems to learn sequences with long-range dependencies. We find that SSMs exhibit an implicit bias toward capturing low-frequency components more effectively than high-frequency ones.
Score: 48.60241978021799
License: http://creativecommons.org/licenses/by/4.0/
Abstract: State space models (SSMs) leverage linear, time-invariant (LTI) systems to effectively learn sequences with long-range dependencies. By analyzing the transfer functions of LTI systems, we find that SSMs exhibit an implicit bias toward capturing low-frequency components more effectively than high-frequency ones. This behavior aligns with the broader notion of frequency bias in deep learning model training. We show that the initialization of an SSM assigns it an innate frequency bias and that training the model in a conventional way does not alter this bias. Based on our theory, we propose two mechanisms to tune frequency bias: either by scaling the initialization to tune the inborn frequency bias; or by applying a Sobolev-norm-based filter to adjust the sensitivity of the gradients to high-frequency inputs, which allows us to change the frequency bias via training. Using an image-denoising task, we empirically show that we can strengthen, weaken, or even reverse the frequency bias using both mechanisms. By tuning the frequency bias, we can also improve SSMs' performance on learning long-range sequences, averaging an 88.26% accuracy on the Long-Range Arena (LRA) benchmark tasks.

Related papers

LLMs are Frequency Pattern Learners in Natural Language Inference [10.931224681685194]
We identify a consistent frequency bias, where predicates in hypotheses occur more frequently than those in premises for positive instances.<n>We find that LLMs exploit frequency bias for inference and perform poorly on adversarial instances.<n>We compute the frequencies of hyponyms and their corresponding hypernyms from WordNet, revealing a correlation between frequency bias and textual entailment.
arXiv Detail & Related papers (2025-05-27T10:45:29Z)
Learning to Dissipate Energy in Oscillatory State-Space Models [55.09730499143998]
State-space models (SSMs) are a class of networks for sequence learning.<n>We show that D-LinOSS consistently outperforms previous LinOSS methods on long-range learning tasks.
arXiv Detail & Related papers (2025-05-17T23:15:17Z)
LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning [47.77830360814755]
Location-aware Cosine Adaptation (LoCA) is a novel frequency-domain parameter-efficient fine-tuning method based on Discrete inverse Cosine Transform (iDCT) Our analysis reveals that frequency-domain decomposition with carefully selected frequency components can surpass the expressivity of traditional low-rank-based methods. Experiments on diverse language and vision fine-tuning tasks demonstrate that LoCA offers enhanced parameter efficiency while maintains computational feasibility comparable to low-rank-based methods.
arXiv Detail & Related papers (2025-02-05T04:14:34Z)
Towards Combating Frequency Simplicity-biased Learning for Domain Generalization [36.777767173275336]
Domain generalization methods aim to learn transferable knowledge from source domains that can generalize well to unseen target domains. Recent studies show that neural networks frequently suffer from a simplicity-biased learning behavior which leads to over-reliance on specific frequency sets. We propose two effective data augmentation modules designed to collaboratively and adaptively adjust the frequency characteristic of the dataset.
arXiv Detail & Related papers (2024-10-21T16:17:01Z)
FreSh: Frequency Shifting for Accelerated Neural Representation Learning [11.175745750843484]
Implicit Neural Representations (INRs) have recently gained attention as a powerful approach for continuously representing signals such as images, videos, and 3D shapes using multilayer perceptrons (MLPs) Low-frequency details are known to exhibit a low-frequency bias, limiting their ability to capture high-frequency details accurately. We propose frequency shifting (or FreSh) to align the frequency spectrum of the initial output with that of the target signal.
arXiv Detail & Related papers (2024-10-07T14:05:57Z)
Oscillatory State-Space Models [61.923849241099184]
We propose Lineary State-Space models (LinOSS) for efficiently learning on long sequences. A stable discretization, integrated over time using fast associative parallel scans, yields the proposed state-space model. We show that LinOSS is universal, i.e., it can approximate any continuous and causal operator mapping between time-varying functions.
arXiv Detail & Related papers (2024-10-04T22:00:13Z)
Mitigating Low-Frequency Bias: Feature Recalibration and Frequency Attention Regularization for Adversarial Robustness [23.77988226456179]
adversarial training (AT) has emerged as a promising defense strategy. AT-trained models exhibit a bias toward low-frequency features while neglecting high-frequency components. We propose High-Frequency Feature Disentanglement and Recalibration (HFDR), a novel module that strategically separates and recalibrates frequency-specific features.
arXiv Detail & Related papers (2024-07-04T15:46:01Z)
Exploring Cross-Domain Few-Shot Classification via Frequency-Aware Prompting [37.721042095518044]
Cross-Domain Few-Shot Learning has witnessed great stride with the development of meta-learning. We propose a Frequency-Aware Prompting method with mutual attention for Cross-Domain Few-Shot classification.
arXiv Detail & Related papers (2024-06-24T08:14:09Z)
Fredformer: Frequency Debiased Transformer for Time Series Forecasting [8.356290446630373]
The Transformer model has shown leading performance in time series forecasting. It tends to learn low-frequency features in the data and overlook high-frequency features, showing a frequency bias. We propose Fredformer, a framework designed to mitigate frequency bias by learning features equally across different frequency bands.
arXiv Detail & Related papers (2024-06-13T11:29:21Z)
Incremental Spatial and Spectral Learning of Neural Operators for Solving Large-Scale PDEs [86.35471039808023]
We introduce the Incremental Fourier Neural Operator (iFNO), which progressively increases the number of frequency modes used by the model. We show that iFNO reduces total training time while maintaining or improving generalization performance across various datasets. Our method demonstrates a 10% lower testing error, using 20% fewer frequency modes compared to the existing Fourier Neural Operator, while also achieving a 30% faster training.
arXiv Detail & Related papers (2022-11-28T09:57:15Z)
Transform Once: Efficient Operator Learning in Frequency Domain [69.74509540521397]
We study deep neural networks designed to harness the structure in frequency domain for efficient learning of long-range correlations in space or time. This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1)
arXiv Detail & Related papers (2022-11-26T01:56:05Z)
SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping [51.698273019061645]
SpecGrad adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram. It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2022-03-31T02:08:27Z)
Adaptive Frequency Learning in Two-branch Face Forgery Detection [66.91715092251258]
We propose Adaptively learn Frequency information in the two-branch Detection framework, dubbed AFD. We liberate our network from the fixed frequency transforms, and achieve better performance with our data- and task-dependent transform layers.
arXiv Detail & Related papers (2022-03-27T14:25:52Z)
Robust Learning with Frequency Domain Regularization [1.370633147306388]
We introduce a new regularization method by constraining the frequency spectra of the filter of the model. We demonstrate the effectiveness of our regularization by (1) defensing to adversarial perturbations; (2) reducing the generalization gap in different architecture; and (3) improving the generalization ability in transfer learning scenario without fine-tune.
arXiv Detail & Related papers (2020-07-07T07:29:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.