GateRA: Token-Aware Modulation for Parameter-Efficient Fine-Tuning
- URL: http://arxiv.org/abs/2511.17582v1
- Date: Sat, 15 Nov 2025 17:55:47 GMT
- Title: GateRA: Token-Aware Modulation for Parameter-Efficient Fine-Tuning
- Authors: Jie Ou, Shuaihong Jiang, Yingjun Du, Cees G. M. Snoek,
- Abstract summary: GateRA is a unified framework that introduces token-aware modulation to dynamically adjust the strength of PEFT updates.<n>By incorporating adaptive gating into standard PEFT branches, GateRA enables selective, token-level adaptation.<n> Experiments on multiple commonsense reasoning benchmarks demonstrate that GateRA consistently outperforms or matches prior PEFT methods.
- Score: 51.79350934271497
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, DoRA, and HiRA, enable lightweight adaptation of large pre-trained models via low-rank updates. However, existing PEFT approaches apply static, input-agnostic updates to all tokens, disregarding the varying importance and difficulty of different inputs. This uniform treatment can lead to overfitting on trivial content or under-adaptation on more informative regions, especially in autoregressive settings with distinct prefill and decoding dynamics. In this paper, we propose GateRA, a unified framework that introduces token-aware modulation to dynamically adjust the strength of PEFT updates. By incorporating adaptive gating into standard PEFT branches, GateRA enables selective, token-level adaptation, preserving pre-trained knowledge for well-modeled inputs while focusing capacity on challenging cases. Empirical visualizations reveal phase-sensitive behaviors, where GateRA automatically suppresses updates for redundant prefill tokens while emphasizing adaptation during decoding. To promote confident and efficient modulation, we further introduce an entropy-based regularization that encourages near-binary gating decisions. This regularization prevents diffuse update patterns and leads to interpretable, sparse adaptation without hard thresholding. Finally, we present a theoretical analysis showing that GateRA induces a soft gradient-masking effect over the PEFT path, enabling continuous and differentiable control over adaptation. Experiments on multiple commonsense reasoning benchmarks demonstrate that GateRA consistently outperforms or matches prior PEFT methods.
Related papers
- GRASP: GRouped Activation Shared Parameterization for Parameter-Efficient Fine-Tuning and Robust Inference of Transformers [12.475144734899674]
We introduce GRASP, a lightweight PEFT framework that partitions the D-dimensional token representations of selected layers into K D groups and learns a shared scaling and shifting vector for each group.<n>We propose StochGRASP, which learns Gaussian distributions as perturbations to the pre-trained weights rather than deterministic values.<n>Under varying levels of noise, StochGRASP consistently outperforms deterministic variants, demonstrating its suitability for energy-efficient and noise-prone hardware platforms.
arXiv Detail & Related papers (2025-12-03T22:17:05Z) - TS-PEFT: Token-Selective Parameter-Efficient Fine-Tuning with Learnable Threshold Gating [8.102270371993411]
We introduce a new paradigm called Token-Selective PEFT (TS-PEFT), in which a function S selectively applies PEFT modifications to a subset of position indices.<n>Our experimental results reveal that the indiscriminate application of PEFT to all indices is not only superfluous, but may also be counterproductive.
arXiv Detail & Related papers (2025-11-20T08:41:20Z) - Uncertainty-Guided Selective Adaptation Enables Cross-Platform Predictive Fluorescence Microscopy [65.15943255667733]
We introduce Subnetwork Image Translation ADDA with automatic depth selection (SIT-ADDA-Auto)<n>We show that adapting only the earliest convolutional layers, while freezing deeper layers, yields reliable transfer.<n>Our results provide a design rule for label-free adaptation in microscopy and a recipe for field settings; the code is publicly available.
arXiv Detail & Related papers (2025-11-15T03:01:05Z) - EntroPE: Entropy-Guided Dynamic Patch Encoder for Time Series Forecasting [50.794700596484894]
We propose EntroPE (Entropy-Guided Dynamic Patch), a novel, temporally informed framework that dynamically detects transition points via conditional entropy.<n>This preserves temporal structure while retaining the computational benefits of patching.<n> Experiments across long-term forecasting benchmarks demonstrate that EntroPE improves both accuracy and efficiency.
arXiv Detail & Related papers (2025-09-30T12:09:56Z) - Task-Adaptive Parameter-Efficient Fine-Tuning for Weather Foundation Models [28.79157031663951]
We introduce WeatherPEFT, a novel PEFT framework for Weather Foundation Models (WFMs)<n>Task-Tu Dynamic Prompting (TTu) injects the embedding weights within the encoder to the input tokens of the backbone of the pre-trained via internal and external pattern extraction.<n>Fisher-Guided Adaptive Selection (SFAS) identifies the most task-critical randomness, preserving in pre-trained knowledge, but also stabilizing the selection.<n>We demonstrate the effectiveness and efficiency of WeatherPEFT on three downstream tasks, where existing PEFT methods show significant gaps versus Full-Tuning, and WeatherPEFT performance parity
arXiv Detail & Related papers (2025-09-26T07:54:05Z) - Knowledge Regularized Negative Feature Tuning of Vision-Language Models for Out-of-Distribution Detection [54.433899174017185]
Out-of-distribution (OOD) detection is crucial for building reliable machine learning models.<n>We propose a novel method called Knowledge Regularized Negative Feature Tuning (KR-NFT)<n>NFT applies distribution-aware transformations to pre-trained text features, effectively separating positive and negative features into distinct spaces.<n>When trained with few-shot samples from ImageNet dataset, KR-NFT not only improves ID classification accuracy and OOD detection but also significantly reduces the FPR95 by 5.44%.
arXiv Detail & Related papers (2025-07-26T07:44:04Z) - Adapting to Fragmented and Evolving Data: A Fisher Information Perspective [0.0]
FADE is a lightweight framework for robust learning under dynamic environments.<n>It employs a shift-aware regularization mechanism anchored in Fisher information geometry.<n>FADE operates online with fixed memory and no access to target labels.
arXiv Detail & Related papers (2025-07-25T06:50:09Z) - EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z) - ADePT: Adaptive Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning [23.511954119467735]
Prompt Tuning (PT) enables the adaptation of Pre-trained Large Language Models (PLMs) to downstream tasks.<n>Decomposed Prompt Tuning (DePT) has demonstrated superior adaptation capabilities.<n>ADePT is composed of a short soft prompt and a shallow token-shared feed-forward neural network.
arXiv Detail & Related papers (2025-01-06T08:20:04Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.