Related papers: GRASP: GRouped Activation Shared Parameterization for Parameter-Efficient Fine-Tuning and Robust Inference of Transformers

GRASP: GRouped Activation Shared Parameterization for Parameter-Efficient Fine-Tuning and Robust Inference of Transformers

URL: http://arxiv.org/abs/2512.04296v1
Date: Wed, 03 Dec 2025 22:17:05 GMT
Title: GRASP: GRouped Activation Shared Parameterization for Parameter-Efficient Fine-Tuning and Robust Inference of Transformers
Authors: Malyaban Bal, Abhronil Sengupta,
Abstract summary: We introduce GRASP, a lightweight PEFT framework that partitions the D-dimensional token representations of selected layers into K D groups and learns a shared scaling and shifting vector for each group.<n>We propose StochGRASP, which learns Gaussian distributions as perturbations to the pre-trained weights rather than deterministic values.<n>Under varying levels of noise, StochGRASP consistently outperforms deterministic variants, demonstrating its suitability for energy-efficient and noise-prone hardware platforms.
Score: 12.475144734899674
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Parameter-efficient fine-tuning (PEFT) provides a scalable alternative to full-model adaptation by updating only a small subset of parameters in large pre-trained models. We introduce GRASP - GRouped Activation Shared Parameterization - a lightweight PEFT framework that partitions the D-dimensional token representations of selected layers into K << D groups and learns a shared scaling and shifting vector for each group. This grouped modulation reduces the number of trainable parameters significantly while preserving the ability of the model to learn task-specific features. Building on this formulation, we further propose StochGRASP, which learns Gaussian distributions as perturbations to the pre-trained weights rather than deterministic values. This probabilistic parameterization along with a noise-aware loss function formulation enables modelling hardware-level variability in programmed weights and significantly improves robustness under non-ideal inference conditions-an important requirement for deployment on edge-based emerging AI hardware. Across GLUE (RoBERTa-base & RoBERTa-large) and E2E NLG (GPT-2 Medium), GRASP matches or exceeds the performance of established PEFT methods while achieving an order of magnitude reduction in trainable parameters compared to LoRA and BitFit. Under varying levels of noise, StochGRASP consistently outperforms deterministic variants, demonstrating its suitability for energy-efficient and noise-prone hardware platforms.

Related papers

High-Rank Structured Modulation for Parameter-Efficient Fine-Tuning [57.85676271833619]
Low-rank Adaptation (LoRA) uses a low-rank update method to simulate full parameter fine-tuning.<n>We present textbfSMoA, a high-rank textbfStructured textbfMOdulation textbfAdapter that uses fewer trainable parameters while maintaining a higher rank.
arXiv Detail & Related papers (2026-01-12T13:06:17Z)
GEM: A Scale-Aware and Distribution-Sensitive Sparse Fine-Tuning Framework for Effective Downstream Adaptation [21.481261295689208]
Gradient-to-Weight Ratio and Entropy-guided Masking (GEM)<n>GEM prioritizes parameters whose updates are significant in proportion to their initial pre-trained values.<n>It also adaptively determines how many parameters to tune at each layer based on the entropy of parameter values.
arXiv Detail & Related papers (2025-08-22T08:12:06Z)
Sparsity May Be All You Need: Sparse Random Parameter Adaptation [7.479026959617763]
Full fine-tuning of large language models for alignment and task adaptation has become prohibitively expensive as models have grown in size.<n>We propose a novel way to reduce the computational and memory resources needed for fine-tuning these models by only training on a small number of parameters instead of all model parameters.<n>Our findings suggest that what truly matters for a PEFT technique to perform well is not necessarily the specific adapter structure, but rather the number of trainable parameters being used.
arXiv Detail & Related papers (2025-02-21T22:23:16Z)
QuIC: Quantum-Inspired Compound Adapters for Parameter Efficient Fine-Tuning [0.0]
Scaling full finetuning of large foundation models strains GPU memory and training time.<n>We introduce Quantum-Inspired Compound Adapters (QuIC Adapters)<n>QuIC adapters can effectively finetune a model using less than 0.02% memory footprint of the base model.
arXiv Detail & Related papers (2025-02-10T13:06:56Z)
FineGates: LLMs Finetuning with Compression using Stochastic Gates [7.093692674858257]
Large Language Models (LLMs) present significant challenges for full finetuning due to the high computational demands.<n>Lightweight finetuning techniques have been proposed, like learning low-rank adapter layers.<n>We propose an adaptor model based on gates that simultaneously sparsify the frozen base model with task-specific adaptation.
arXiv Detail & Related papers (2024-12-17T14:33:05Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method.<n>We propose a higher-order Candecomp/Parafac (CP) decomposition, enabling a more compact and flexible representation.<n>Our method can achieve a reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z)
ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections [59.839926875976225]
We propose the ETHER transformation family, which performs Efficient fineTuning via HypErplane Reflections. In particular, we introduce ETHER and its relaxation ETHER+, which match or outperform existing PEFT methods with significantly fewer parameters.
arXiv Detail & Related papers (2024-05-30T17:26:02Z)
Parameter Efficient Fine-tuning via Cross Block Orchestration for Segment Anything Model [81.55141188169621]
We equip PEFT with a cross-block orchestration mechanism to enable the adaptation of the Segment Anything Model (SAM) to various downstream scenarios. We propose an intra-block enhancement module, which introduces a linear projection head whose weights are generated from a hyper-complex layer. Our proposed approach consistently improves the segmentation performance significantly on novel scenarios with only around 1K additional parameters.
arXiv Detail & Related papers (2023-11-28T11:23:34Z)
Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network. We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.