Fugu-MT 論文翻訳(概要): Analyzing the Effect of Noise in LLM Fine-tuning

論文の概要: Analyzing the Effect of Noise in LLM Fine-tuning

arxiv url: http://arxiv.org/abs/2604.12469v1
Date: Tue, 14 Apr 2026 08:54:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-15 19:11:32.354334
Title: Analyzing the Effect of Noise in LLM Fine-tuning
Title（参考訳）: LLM微調整におけるノイズの影響の解析
Authors: Lingfang Li, Procheta Sen,
Abstract要約: 3つの事前訓練されたモデルファミリーと3つの多様なNLPタスクのモデル行動に及ぼすノイズの影響について検討した。本稿では,3種類の実世界の騒音(ラベルノイズ,文法ノイズ,タイポグラフィノイズ)に対応する制御摂動を導入する。その結果, ラベルの劣化が常に最大の性能劣化を引き起こしているのに対し, 文法的ノイズとタイポグラフィ的ノイズは時折緩やかな正規化の恩恵をもたらすことが示唆された。
参考スコア（独自算出の注目度）: 8.326903332564365
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-tuning is the dominant paradigm for adapting pretrained large language models (LLMs) to downstream NLP tasks. In practice, fine-tuning datasets may contain various forms of noise arising from annotation errors, preprocessing artifacts, or automated data collection. While prior work has focused on designing robust learning algorithms to mitigate performance degradation under noisy conditions, comparatively little is known about how different types of noise affect the internal learning dynamics of LLMs during fine-tuning. In this work, we systematically study the impact of noise on model behavior across three pretrained model families (GPT-2, Qwen2 and Llama-2) and three diverse NLP tasks. We introduce controlled perturbations corresponding to three common real-world noise types: label noise, grammatical noise, and typographical noise. Beyond task-level performance, we analyze layer-wise representation changes and attention patterns to understand how noise propagates through the network. Our results show that corrupting labels (i.e. label noise) consistently causes the largest performance degradation, whereas grammatical noise and typographical noise can occasionally yield mild regularization benefits. We further find that noise effects are localized primarily to task-specific layers, while attention structures remain comparatively stable.
Abstract（参考訳）: ファインチューニングは、訓練済みの大規模言語モデル(LLM)を下流のNLPタスクに適用するための主要なパラダイムである。実際には、微調整データセットは、アノテーションエラー、前処理アーティファクト、自動データ収集から生じる様々な種類のノイズを含むことができる。従来の研究では、ノイズの多い条件下での性能劣化を軽減するために頑健な学習アルゴリズムの設計に重点を置いていたが、音の種類の違いが微調整中のLLMの内部学習力学に与える影響についてはほとんど分かっていない。本研究では、3つの事前訓練されたモデルファミリー(GPT-2, Qwen2, Llama-2)と3つの異なるNLPタスクのモデル行動に及ぼすノイズの影響を系統的に研究する。本稿では,3種類の実世界の騒音(ラベルノイズ,文法ノイズ,タイポグラフィノイズ)に対応する制御摂動を導入する。タスクレベルの性能の他に、階層的な表現の変化や注意パターンを分析し、ネットワークを通してノイズがどのように伝播するかを理解する。その結果,ラベルの劣化(ラベルノイズ)が常に最大の性能劣化を引き起こすのに対し,文法的ノイズとタイポグラフィ的ノイズは時折緩やかな正規化の恩恵をもたらすことが示唆された。さらに、ノイズ効果は主にタスク固有の層に局所化されているが、注意構造は比較的安定している。

論文の概要: Analyzing the Effect of Noise in LLM Fine-tuning

関連論文リスト