Fugu-MT 論文翻訳(概要): Can Editing 1 Neuron Fix Repetition Loops in LLMs?

論文の概要: Can Editing 1 Neuron Fix Repetition Loops in LLMs?

arxiv url: http://arxiv.org/abs/2606.13705v1
Date: Tue, 09 Jun 2026 21:20:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-15 16:00:42.504583
Title: Can Editing 1 Neuron Fix Repetition Loops in LLMs?
Title（参考訳）: LLMにおける1ニューロンの繰り返しループの編集は可能か?
Authors: Aristotelis Lazaridis, Aman Sharma, Dylan Bates, Brian King, Vincent Lu, Jack FitzGerald,
Abstract要約: Gemma 4の命令チューニングモデルは再現可能な失敗を共有している。これらのループは95%の速度で発生し、即時リワードを継続する。本稿では,この動作が重み編集によって除去できるほど局所化されているかを検討する。
参考スコア（独自算出の注目度）: 5.310892696470208
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Yes. Can it cure doom loops? Probably not. The Gemma 4 instruction-tuned models share a reproducible failure: on long factual enumeration prompts, such as listing every episode of a TV series, the 88 IAU constellations, or the 151 original Pokemon, they collapse into repetition, either a tight verbatim loop or a list whose entries decay onto a single answer. These loops occur at rates as high as 95% and survive prompt rewording, inference-engine changes, and most sampling adjustments. In this paper we explore whether this behavior is localized enough to remove by weight edits. To localize the cause, we use per-layer ablation and per-neuron attribution, then confirm the strongest candidates with full-generation sweeps. The loops trace to a small set of MLP neurons (or, in the 26B-A4B Mixture-of-Experts model, a few routed experts) which we suppress with static weight edits. These "surgeries" can be as small as a single sign-inverted neuron (in the E2B model). The size of the effective edits grows with model scale, but in all cases, the loop patterns can be addressed at normal generation budgets while preserving general-purpose benchmark scores. However, the edits do not solve everything: we also study longer thinking budgets, where the two larger models most visibly enter doom looping, i.e. a non-convergent regime in which the model self-corrects in circles over a fact it cannot recall, exhausting the budget without committing to a final answer. We show this residual failure is reduced but not eliminated by the same edits, and argue it is fundamentally a knowledge-precision problem rather than a removable circuit; weight surgery can delete a loop, but it cannot supply a missing fact. Our results are both a feasibility demonstration, that is, evidence that a concrete generation pathology can be localized to a few parameters and edited out, and a delineation of where that approach stops.
Abstract（参考訳）: はい。ドゥームループを治せるか? おそらくそうではない。 Gemma 4では、テレビシリーズ、88のIAU星座、または151のPokemonの全てのエピソードを列挙するなど、長い実例列挙のプロンプトにおいて、厳密な動詞のループまたは一つの答えにエントリーが減衰するリストの繰り返しに崩壊する。これらのループは95%の速度で発生し、即時リワード、推論エンジンの変更、ほとんどのサンプリング調整を継続する。本稿では,この動作が重み編集によって除去できるほど局所化されているかを検討する。原因をローカライズするために、我々は階層単位のアブレーションとニューロン単位のアトリビューションを使用し、その後、フルジェネレーションスイープで最も強い候補を確認する。ループはMLPニューロンの小さなセット(または26B-A4Bmixture-of-Expertsモデル、いくつかのルート付きエキスパート)に遡り、静的な重み付けで抑制する。これらの「シュガージー」はシングルサイン反転ニューロン(E2Bモデル)と同じくらい小さい。有効編集のサイズはモデルスケールで大きくなるが、すべての場合、一般的なベンチマークスコアを維持しながら、通常の生成予算でループパターンに対処できる。しかし、編集は、全てを解決しない:我々はまた、より長い思考予算、すなわち2つの大きなモデルが最も視覚的にドゥームループに入る、すなわち、モデルがリコールできないという事実を自己修正する非収束的な体制を研究し、最終的な答えをコミットすることなく予算を消耗する。この残余故障は、同じ編集によって排除されるのではなく、基本的には取り外し可能な回路というよりは知識精度の問題であると主張し、重み付け手術はループを削除できるが、欠落した事実を供給できない。本研究の結果は, 具体的生成病理がいくつかのパラメータに局所化され, 編集され得ることの実証であり, そのアプローチがどこで止まるかの詳細な説明である。

論文の概要: Can Editing 1 Neuron Fix Repetition Loops in LLMs?

関連論文リスト