SaFARi: State-Space Models for Frame-Agnostic Representation
- URL: http://arxiv.org/abs/2505.08977v1
- Date: Tue, 13 May 2025 21:39:40 GMT
- Title: SaFARi: State-Space Models for Frame-Agnostic Representation
- Authors: Hossein Babaei, Mel White, Sina Alemohammad, Richard G. Baraniuk,
- Abstract summary: State-Space Models (SSMs) have re-emerged as a powerful tool for online function, and as the backbone of machine learning models for long-range dependent data.<n>We present a method for building an SSM with any frame or basis, rather than being restricted tognostics.<n>This framework encompasses the approach known as HiPPO, but also permits an infinite diversity of other possible "species" within the SSM architecture.
- Score: 22.697360024988484
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-Space Models (SSMs) have re-emerged as a powerful tool for online function approximation, and as the backbone of machine learning models for long-range dependent data. However, to date, only a few polynomial bases have been explored for this purpose, and the state-of-the-art implementations were built upon the best of a few limited options. In this paper, we present a generalized method for building an SSM with any frame or basis, rather than being restricted to polynomials. This framework encompasses the approach known as HiPPO, but also permits an infinite diversity of other possible "species" within the SSM architecture. We dub this approach SaFARi: SSMs for Frame-Agnostic Representation.
Related papers
- WaLRUS: Wavelets for Long-range Representation Using SSMs [22.697360024988484]
State-Space Models (SSMs) have proven to be powerful tools for modeling long-range dependencies in sequential data.<n>We introduce WaLRUS, a new implementation of SaFARi built from Daubechies wavelets.
arXiv Detail & Related papers (2025-05-17T22:41:24Z) - Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo [90.78001821963008]
A wide range of LM applications require generating text that conforms to syntactic or semantic constraints.<n>We develop an architecture for controlled LM generation based on sequential Monte Carlo (SMC)<n>Our system builds on the framework of Lew et al. (2023) and integrates with its language model probabilistic programming language.
arXiv Detail & Related papers (2025-04-17T17:49:40Z) - Multi-convex Programming for Discrete Latent Factor Models Prototyping [8.322623345761961]
We propose a generic framework based on CVXPY, which allows users to specify and solve the fitting problem of a wide range of DLFMs.<n>Our framework is flexible and inherently supports the integration of regularization terms and constraints on the DLFM parameters and latent factors.
arXiv Detail & Related papers (2025-04-02T07:33:54Z) - Closed-form merging of parameter-efficient modules for Federated Continual Learning [9.940242741914748]
We introduce LoRM, an alternating optimization strategy that trains one LoRA matrix at a time.<n>We apply our proposed methodology to Federated Class-Incremental Learning (FCIL)<n>Our method demonstrates state-of-the-art performance across a range of FCIL scenarios.
arXiv Detail & Related papers (2024-10-23T15:30:13Z) - Parameter-Efficient Fine-Tuning of State Space Models [10.817729275974829]
Deep State Space Models (SSMs) have become powerful tools for language modeling, offering high performance and linear scalability with sequence length.<n>This paper investigates the application of parameter-efficient fine-tuning (PEFT) methods to SSM-based models.<n>We propose Sparse Dimension Tuning (SDT), a PEFT method tailored for SSM modules.
arXiv Detail & Related papers (2024-10-11T17:30:28Z) - High-Performance Few-Shot Segmentation with Foundation Models: An Empirical Study [64.06777376676513]
We develop a few-shot segmentation (FSS) framework based on foundation models.
To be specific, we propose a simple approach to extract implicit knowledge from foundation models to construct coarse correspondence.
Experiments on two widely used datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-10T08:04:11Z) - SPAN: Unlocking Pyramid Representations for Gigapixel Histopathological Images [8.026588319629528]
Whole slide images (WSIs) present fundamental computational challenges due to their gigapixel-scale resolutions and sparse, irregularly distributed informative regions.<n>We propose a novel sparse-native computational framework that preserves exact spatial relationships.<n>We develop Sparse Pyramid Attention Networks (SPAN), incorporating a hierarchical sparse pyramid attention architecture with shifted windows.
arXiv Detail & Related papers (2024-06-13T17:14:30Z) - Explaining Modern Gated-Linear RNNs via a Unified Implicit Attention Formulation [54.50526986788175]
Recent advances in efficient sequence modeling have led to attention-free layers, such as Mamba, RWKV, and various gated RNNs.
We present a unified view of these models, formulating such layers as implicit causal self-attention layers.
Our framework compares the underlying mechanisms on similar grounds for different layers and provides a direct means for applying explainability methods.
arXiv Detail & Related papers (2024-05-26T09:57:45Z) - Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks [50.29356570858905]
We introduce the Dynamical Systems Framework (DSF), which allows a principled investigation of all these architectures in a common representation.<n>We provide principled comparisons between softmax attention and other model classes, discussing the theoretical conditions under which softmax attention can be approximated.<n>This shows the DSF's potential to guide the systematic development of future more efficient and scalable foundation models.
arXiv Detail & Related papers (2024-05-24T17:19:57Z) - PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems.
We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models.
We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z) - Deep Conditional Transformation Models [0.0]
Learning the cumulative distribution function (CDF) of an outcome variable conditional on a set of features remains challenging.
Conditional transformation models provide a semi-parametric approach that allows to model a large class of conditional CDFs.
We propose a novel network architecture, provide details on different model definitions and derive suitable constraints.
arXiv Detail & Related papers (2020-10-15T16:25:45Z) - S2RMs: Spatially Structured Recurrent Modules [105.0377129434636]
We take a step towards exploiting dynamic structure that are capable of simultaneously exploiting both modular andtemporal structures.
We find our models to be robust to the number of available views and better capable of generalization to novel tasks without additional training.
arXiv Detail & Related papers (2020-07-13T17:44:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.