State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models
- URL: http://arxiv.org/abs/2503.03499v1
- Date: Wed, 05 Mar 2025 13:44:42 GMT
- Title: State-offset Tuning: State-based Parameter-Efficient Fine-Tuning for State Space Models
- Authors: Wonjun Kang, Kevin Galim, Yuchen Zeng, Minjae Lee, Hyung Il Koo, Nam Ik Cho,
- Abstract summary: State Space Models (SSMs) have emerged as efficient alternatives to Transformers.<n> prompt-based methods like Prompt Tuning and Prefix-Tuning do not perform well on SSMs.<n>We propose state-based methods as a superior alternative to prompt-based methods.
- Score: 19.262293564884715
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State Space Models (SSMs) have emerged as efficient alternatives to Transformers, mitigating their quadratic computational cost. However, the application of Parameter-Efficient Fine-Tuning (PEFT) methods to SSMs remains largely unexplored. In particular, prompt-based methods like Prompt Tuning and Prefix-Tuning, which are widely used in Transformers, do not perform well on SSMs. To address this, we propose state-based methods as a superior alternative to prompt-based methods. This new family of methods naturally stems from the architectural characteristics of SSMs. State-based methods adjust state-related features directly instead of depending on external prompts. Furthermore, we introduce a novel state-based PEFT method: State-offset Tuning. At every timestep, our method directly affects the state at the current step, leading to more effective adaptation. Through extensive experiments across diverse datasets, we demonstrate the effectiveness of our method. Code is available at https://github.com/furiosa-ai/ssm-state-tuning.
Related papers
- Sparse Gradient Compression for Fine-Tuning Large Language Models [58.44973963468691]
Fine-tuning large language models (LLMs) for downstream tasks has become increasingly crucial due to their widespread use and the growing availability of open-source models.
High memory costs associated with fine-tuning remain a significant challenge, especially as models increase in size.
We propose sparse compression gradient (SGC) to address these limitations.
arXiv Detail & Related papers (2025-02-01T04:18:28Z) - ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.
Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z) - Sparse Orthogonal Parameters Tuning for Continual Learning [34.462967722928724]
Continual learning methods based on pre-trained models (PTM) have recently gained attention which adapt to successive downstream tasks without catastrophic forgetting.
We propose a novel yet effective method called SoTU (Sparse Orthogonal Parameters TUning)
arXiv Detail & Related papers (2024-11-05T05:19:09Z) - A Semantic-Aware Layer-Freezing Approach to Computation-Efficient Fine-Tuning of Language Models [32.178931149612644]
Finetuning language models (LMs) is crucial for adapting the models to downstream data and tasks.<n>We propose a pioneering work on reducing the cost of backpropagation (at the layer level) by answering where to finetune.<n>We perform extensive experiments across well-known LMs and datasets.
arXiv Detail & Related papers (2024-06-17T17:13:08Z) - State-Free Inference of State-Space Models: The Transfer Function Approach [132.83348321603205]
State-free inference does not incur any significant memory or computational cost with an increase in state size.
We achieve this using properties of the proposed frequency domain transfer function parametrization.
We report improved perplexity in language modeling over a long convolutional Hyena baseline.
arXiv Detail & Related papers (2024-05-10T00:06:02Z) - Skeleton: A New Framework for Accelerating Language Models via Task Neuron Localized Prompt Tuning [15.695487920048816]
We propose a novel prompt tuning framework called Skeleton to efficiently utilize a language model in terms of memory and time complexity.
Our method significantly enhances inference efficiency (at most x 1.73 speed up) for various widely used benchmarks.
arXiv Detail & Related papers (2024-04-18T05:43:50Z) - Parameter-Adaptive Approximate MPC: Tuning Neural-Network Controllers without Retraining [50.00291020618743]
This work introduces a novel, parameter-adaptive AMPC architecture capable of online tuning without recomputing large datasets and retraining.
We showcase the effectiveness of parameter-adaptive AMPC by controlling the swing-ups of two different real cartpole systems with a severely resource-constrained microcontroller (MCU)
Taken together, these contributions represent a marked step toward the practical application of AMPC in real-world systems.
arXiv Detail & Related papers (2024-04-08T20:02:19Z) - Parameter-Efficient Fine-Tuning without Introducing New Latency [7.631596468553607]
We introduce a novel adapter technique that directly applies the adapter to pre-trained parameters instead of the hidden representation.
Our proposed method attains a new state-of-the-art outcome in terms of both performance and storage efficiency, storing only 0.03% parameters of full fine-tuning.
arXiv Detail & Related papers (2023-05-26T08:44:42Z) - Ahead-of-Time P-Tuning [0.2538209532048867]
Ahead-of-Time (AoT) P-Tuning is a parameter-efficient fine-tuning method for pre-trained Language Models (LMs)
We evaluate AoT P-Tuning on GLUE and SuperGLUE benchmarking datasets using RoBERTa and DeBERTa models.
Our method enables multi-task inference with a single backbone LM, making it a practical solution for real-world applications.
arXiv Detail & Related papers (2023-05-18T09:24:53Z) - Rethinking Efficient Tuning Methods from a Unified Perspective [34.67645496324432]
We revisit the design paradigm of PETL and derive a unified framework U-Tuning for parameter-efficient transfer learning.
The U-Tuning framework can simultaneously encompass existing methods and derive new approaches for parameter-efficient transfer learning.
arXiv Detail & Related papers (2023-03-01T17:38:03Z) - On Controller Tuning with Time-Varying Bayesian Optimization [74.57758188038375]
We will use time-varying optimization (TVBO) to tune controllers online in changing environments using appropriate prior knowledge on the control objective and its changes.
We propose a novel TVBO strategy using Uncertainty-Injection (UI), which incorporates the assumption of incremental and lasting changes.
Our model outperforms the state-of-the-art method in TVBO, exhibiting reduced regret and fewer unstable parameter configurations.
arXiv Detail & Related papers (2022-07-22T14:54:13Z) - Parameter-Efficient Tuning by Manipulating Hidden States of Pretrained
Language Models For Classification Tasks [49.807185872741066]
We propose a simple tuning method which only introduces three trainable vectors.
We input the integrated hidden state(s) to a task-specific linear classifier to predict categories.
This scheme is similar to the way ELMo utilises hidden states except that they feed the hidden states to LSTM-based models.
arXiv Detail & Related papers (2022-04-10T04:14:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.