Related papers: Selection Mechanisms for Sequence Modeling using Linear State Space Models

Selection Mechanisms for Sequence Modeling using Linear State Space Models

URL: http://arxiv.org/abs/2505.17932v1
Date: Fri, 23 May 2025 14:08:56 GMT
Title: Selection Mechanisms for Sequence Modeling using Linear State Space Models
Authors: Umberto Casti, Sandro Zampieri, Fabio Pasqualetti,
Abstract summary: We introduce an alternative selection mechanism inspired by control theory methodologies.<n>We propose a novel residual generator for selection, drawing an analogy to fault detection strategies in Linear Time-Invariant (LTI) systems.<n>Our approach combines multiple LTI systems, preserving their beneficial properties during training while achieving comparable selectivity.
Score: 2.4374097382908477
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in language modeling tasks have been driven by architectures such as Transformers and, more recently, by Selective State Space Models (SSMs). In this paper, we introduce an alternative selection mechanism inspired by control theory methodologies. Specifically, we propose a novel residual generator for selection, drawing an analogy to fault detection strategies in Linear Time-Invariant (LTI) systems. Unlike Mamba, which utilizes Linear Time-Varying (LTV) systems, our approach combines multiple LTI systems, preserving their beneficial properties during training while achieving comparable selectivity. To evaluate the effectiveness of the proposed architecture, we test its performance on synthetic tasks. While these tasks are not inherently critical, they serve as benchmarks to test the selectivity properties of different cores architecture. This work highlights the potential of integrating theoretical insights with experimental advancements, offering a complementary perspective to deep learning innovations at the intersection of control theory and machine learning.

Related papers

Data-Driven Dimensional Synthesis of Diverse Planar Four-bar Function Generation Mechanisms via Direct Parameterization [2.499517394718329]
We propose a data-driven framework that bypasses traditional equation-solving and optimization by leveraging supervised learning.<n>Our method combines a synthetic dataset, an LSTM-based neural network for handling sequential precision points, and a Mixture of Experts architecture tailored to different linkage types.<n> Experiments show our approach produces accurate, defect-free linkages across various configurations.
arXiv Detail & Related papers (2025-07-11T02:32:29Z)
Vehicle Suspension Recommendation System: Multi-Fidelity Neural Network-based Mechanism Design Optimization [4.038368925548051]
Vehicle suspensions are designed to improve driving performance and ride comfort, but different types are available depending on the environment. Traditional design process is multi-step, gradually reducing the number of design candidates while performing costly analyses to meet target performance. Recently, AI models have been used to reduce the computational cost of FEA.
arXiv Detail & Related papers (2024-10-03T23:54:03Z)
Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning [113.89327264634984]
Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new classes into a model with minimal training samples. Traditional methods widely adopt static adaptation relying on a fixed parameter space to learn from data that arrive sequentially. We propose a dual selective SSM projector that dynamically adjusts the projection parameters based on the intermediate features for dynamic adaptation.
arXiv Detail & Related papers (2024-07-08T17:09:39Z)
Theoretical Foundations of Deep Selective State-Space Models [13.971499161967083]
Deep SSMs demonstrate outstanding performance across a diverse set of domains.<n>Recent developments show that if the linear recurrence powering SSMs allows for multiplicative interactions between inputs and hidden states.<n>We show that when random linear recurrences are equipped with simple input-controlled transitions, then the hidden state is provably a low-dimensional projection of a powerful mathematical object.
arXiv Detail & Related papers (2024-02-29T11:20:16Z)
PASTA: Pretrained Action-State Transformer Agents [10.654719072766495]
Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data. In reinforcement learning, researchers have recently adapted these approaches, developing models pre-trained on expert trajectories.
arXiv Detail & Related papers (2023-07-20T15:09:06Z)
Online simulator-based experimental design for cognitive model selection [74.76661199843284]
We propose BOSMOS: an approach to experimental design that can select between computational models without tractable likelihoods. In simulated experiments, we demonstrate that the proposed BOSMOS technique can accurately select models in up to 2 orders of magnitude less time than existing LFI alternatives.
arXiv Detail & Related papers (2023-03-03T21:41:01Z)
A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences [55.25331349436895]
Deep generative models have emerged as a popular machine learning-based approach for inverse problems in the life sciences. These problems often require sampling new designs that satisfy multiple properties of interest in addition to learning the data distribution.
arXiv Detail & Related papers (2022-10-19T19:04:45Z)
Gradient-Based Trajectory Optimization With Learned Dynamics [80.41791191022139]
We use machine learning techniques to learn a differentiable dynamics model of the system from data. We show that a neural network can model highly nonlinear behaviors accurately for large time horizons. In our hardware experiments, we demonstrate that our learned model can represent complex dynamics for both the Spot and Radio-controlled (RC) car.
arXiv Detail & Related papers (2022-04-09T22:07:34Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
GEM: Group Enhanced Model for Learning Dynamical Control Systems [78.56159072162103]
We build effective dynamical models that are amenable to sample-based learning. We show that learning the dynamics on a Lie algebra vector space is more effective than learning a direct state transition model. This work sheds light on a connection between learning of dynamics and Lie group properties, which opens doors for new research directions.
arXiv Detail & Related papers (2021-04-07T01:08:18Z)
Trajectory-wise Multiple Choice Learning for Dynamics Generalization in Reinforcement Learning [137.39196753245105]
We present a new model-based reinforcement learning algorithm that learns a multi-headed dynamics model for dynamics generalization. We incorporate context learning, which encodes dynamics-specific information from past experiences into the context latent vector. Our method exhibits superior zero-shot generalization performance across a variety of control tasks, compared to state-of-the-art RL methods.
arXiv Detail & Related papers (2020-10-26T03:20:42Z)
DyNODE: Neural Ordinary Differential Equations for Dynamics Modeling in Continuous Control [0.0]
We present a novel approach that captures the underlying dynamics of a system by incorporating control in a neural ordinary differential equation framework. Results indicate that a simple DyNODE architecture when combined with an actor-critic reinforcement learning algorithm outperforms canonical neural networks.
arXiv Detail & Related papers (2020-09-09T12:56:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.