Fugu-MT 論文翻訳(概要): GRASP: GRouped Activation Shared Parameterization for Parameter-Efficient Fine-Tuning and Robust Inference of Transformers

論文の概要: GRASP: GRouped Activation Shared Parameterization for Parameter-Efficient Fine-Tuning and Robust Inference of Transformers

arxiv url: http://arxiv.org/abs/2512.04296v1
Date: Wed, 03 Dec 2025 22:17:05 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-05 21:11:45.903282
Title: GRASP: GRouped Activation Shared Parameterization for Parameter-Efficient Fine-Tuning and Robust Inference of Transformers
Title（参考訳）: GRASP: GRouped Activation Shared Parameterization for Parameter-Efficient Fine-Tuning and Robust Inference of Transformer
Authors: Malyaban Bal, Abhronil Sengupta,
Abstract要約: 我々は,選択したレイヤのD次元トークン表現をKDグループに分割する軽量PEFTフレームワークGRASPを紹介し,各グループに対して共有スケーリングおよびシフトベクトルを学習する。決定論的値ではなく,事前学習した重みに対する摂動としてガウス分布を学習するStochGRASPを提案する。様々なノイズレベルの下で、StochGRASPは決定論的変異を一貫して上回り、エネルギー効率とノイズを発生させるハードウェアプラットフォームに適していることを示した。
参考スコア（独自算出の注目度）: 12.475144734899674
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Parameter-efficient fine-tuning (PEFT) provides a scalable alternative to full-model adaptation by updating only a small subset of parameters in large pre-trained models. We introduce GRASP - GRouped Activation Shared Parameterization - a lightweight PEFT framework that partitions the D-dimensional token representations of selected layers into K << D groups and learns a shared scaling and shifting vector for each group. This grouped modulation reduces the number of trainable parameters significantly while preserving the ability of the model to learn task-specific features. Building on this formulation, we further propose StochGRASP, which learns Gaussian distributions as perturbations to the pre-trained weights rather than deterministic values. This probabilistic parameterization along with a noise-aware loss function formulation enables modelling hardware-level variability in programmed weights and significantly improves robustness under non-ideal inference conditions-an important requirement for deployment on edge-based emerging AI hardware. Across GLUE (RoBERTa-base & RoBERTa-large) and E2E NLG (GPT-2 Medium), GRASP matches or exceeds the performance of established PEFT methods while achieving an order of magnitude reduction in trainable parameters compared to LoRA and BitFit. Under varying levels of noise, StochGRASP consistently outperforms deterministic variants, demonstrating its suitability for energy-efficient and noise-prone hardware platforms.
Abstract（参考訳）: パラメータ効率細調整(PEFT)は、大規模な事前学習モデルにおいて、パラメータの小さなサブセットだけを更新することで、フルモデル適応のスケーラブルな代替手段を提供する。 GRASP - GRouped Activation Shared Parameterization - 選択したレイヤのD次元トークン表現をK<<D>グループに分割する軽量PEFTフレームワーク。このグループ化変調は、タスク固有の特徴を学習するモデルの能力を維持しながら、トレーニング可能なパラメータの数を著しく削減する。この定式化に基づいてさらにStochGRASPを提案し、決定論的値ではなく、事前学習された重みに対する摂動としてガウス分布を学習する。この確率的パラメータ化とノイズ認識損失関数の定式化は、プログラムされたウェイトにおけるハードウェアレベルの変動をモデル化し、非理想的推論条件下での堅牢性を大幅に改善する。 GLUE (RoBERTa-base & RoBERTa-large) と E2E NLG (GPT-2 Medium) にまたがって, GRASP は, LoRA や BitFit と比較してトレーニング可能なパラメータの桁違いの縮小を実現しつつ, PEFT 法の性能に適合する。様々なノイズレベルの下で、StochGRASPは決定論的変異を一貫して上回り、エネルギー効率とノイズを発生させるハードウェアプラットフォームに適していることを示した。

論文の概要: GRASP: GRouped Activation Shared Parameterization for Parameter-Efficient Fine-Tuning and Robust Inference of Transformers

関連論文リスト