Incorporating Exponential Smoothing into MLP: A Simple but Effective Sequence Model
- URL: http://arxiv.org/abs/2403.17445v1
- Date: Tue, 26 Mar 2024 07:23:46 GMT
- Title: Incorporating Exponential Smoothing into MLP: A Simple but Effective Sequence Model
- Authors: Jiqun Chu, Zuoquan Lin,
- Abstract summary: A recently developed model, the Structured State Space (S4), demonstrated significant effectiveness in modeling long-range sequences.
We propose exponential smoothing (ETS) to augment and reduce inductive bias.
Our models achieve comparable results to S4 on the LRA benchmark.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modeling long-range dependencies in sequential data is a crucial step in sequence learning. A recently developed model, the Structured State Space (S4), demonstrated significant effectiveness in modeling long-range sequences. However, It is unclear whether the success of S4 can be attributed to its intricate parameterization and HiPPO initialization or simply due to State Space Models (SSMs). To further investigate the potential of the deep SSMs, we start with exponential smoothing (ETS), a simple SSM, and propose a stacked architecture by directly incorporating it into an element-wise MLP. We augment simple ETS with additional parameters and complex field to reduce the inductive bias. Despite increasing less than 1\% of parameters of element-wise MLP, our models achieve comparable results to S4 on the LRA benchmark.
Related papers
- Provable Benefits of Complex Parameterizations for Structured State Space Models [51.90574950170374]
Structured state space models (SSMs) are linear dynamical systems adhering to a specified structure.
In contrast to typical neural network modules, whose parameterizations are real, SSMs often use complex parameterizations.
This paper takes a step towards explaining the benefits of complex parameterizations for SSMs by establishing formal gaps between real and complex diagonal SSMs.
arXiv Detail & Related papers (2024-10-17T22:35:50Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - SMR: State Memory Replay for Long Sequence Modeling [19.755738298836526]
This paper proposes a novel non-recursive non-uniform sample processing strategy to overcome compatibility limitations in parallel convolutional computation.
We introduce State Memory Replay (SMR), which utilizes learnable memories to adjust the current state with multi-step information for generalization at sampling points different from those in the training data.
Experiments on long-range modeling tasks in autoregressive language modeling and Long Range Arena demonstrate the general effectiveness of the SMR mechanism for a series of SSM models.
arXiv Detail & Related papers (2024-05-27T17:53:32Z) - HOPE for a Robust Parameterization of Long-memory State Space Models [51.66430224089725]
State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences.
We develop a new parameterization scheme, called HOPE, for LTI systems that utilize Markov parameters within Hankel operators.
Our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.
arXiv Detail & Related papers (2024-05-22T20:20:14Z) - Robustifying State-space Models for Long Sequences via Approximate
Diagonalization [47.321212977509454]
State-space models (SSMs) have emerged as a framework for learning long-range sequence tasks.
diagonalizing the HiPPO framework is itself an ill-posed problem.
We introduce a generic, backward-stable "perturb-then-diagonalize" (PTD) methodology.
arXiv Detail & Related papers (2023-10-02T23:36:13Z) - A Neural State-Space Model Approach to Efficient Speech Separation [34.38911304755453]
We introduce S4M, a new efficient speech separation framework based on neural state-space models (SSM)
To extend the SSM technique into speech separation tasks, we first decompose the input mixture into multi-scale representations with different resolutions.
Experiments show that S4M performs comparably to other separation backbones in terms of SI-SDRi.
Our S4M-tiny model (1.8M parameters) even surpasses attention-based Sepformer (26.0M parameters) in noisy conditions with only 9.2 of multiply-accumulate operation (MACs)
arXiv Detail & Related papers (2023-05-26T13:47:11Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - Liquid Structural State-Space Models [106.74783377913433]
Liquid-S4 achieves an average performance of 87.32% on the Long-Range Arena benchmark.
On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4.
arXiv Detail & Related papers (2022-09-26T18:37:13Z) - Simplified State Space Layers for Sequence Modeling [11.215817688691194]
Recently, models using structured state space sequence layers achieved state-of-the-art performance on many long-range tasks.
We revisit the idea that closely following the HiPPO framework is necessary for high performance.
We replace the bank of many independent single-input, single-output (SISO) SSMs the S4 layer uses with one multi-input, multi-output (MIMO) SSM.
S5 matches S4's performance on long-range tasks, including achieving an average of 82.46% on the suite of Long Range Arena benchmarks.
arXiv Detail & Related papers (2022-08-09T17:57:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.