Fugu-MT 論文翻訳(概要): Search over Self-Edit Strategies for LLM Adaptation

論文の概要: Search over Self-Edit Strategies for LLM Adaptation

arxiv url: http://arxiv.org/abs/2601.14532v1
Date: Tue, 20 Jan 2026 22:59:58 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-22 21:27:50.180382
Title: Search over Self-Edit Strategies for LLM Adaptation
Title（参考訳）: LLM適応のための自己編集戦略の探索
Authors: Alistair Cheong, Haolin Cong, Tyler Yang, Dustin Miao,
Abstract要約: 本研究では,LLMがタスクフィードバックを用いて重みを更新する方法について検討する。自己適応型言語モデル(SEAL)フレームワークをテストベッドとして使用し、固定されたヒューマンテンプレート制約を緩和した。テンプレート生成が過去のテンプレートの軽量アーカイブで条件付けられたかどうかが異なる2つのバリエーションが研究された。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many LLM-based open-ended search systems freeze the foundation model that proposes improvements to existing solutions, which may bottleneck long-run progress. Recent work has explored updating the proposal model at test time [arXiv:2511.23473], but the update strategy is still typically hand-specified. Therefore, this study investigated whether an LLM can use task feedback to decide how it should update its weights. For tractability, we focused on the simpler case where there is only one round of self-improvement, and restricted the update operator to self-supervised next token prediction (NTP), leaving the model freedom in choosing its training data and key NTP hyperparameters. Using the Self-Adapting Language Models (SEAL) [arXiv:2506.10943] framework as a testbed, we relaxed its fixed human template constraint and allowed the model to generate its own self-edit templates, thereby giving it more control over its training data and hyperparameters. Two variants were studied, differing in whether template generation was conditioned on a lightweight archive of past templates. In SEAL's Single-Passage Knowledge Incorporation setting with Qwen3-8B on SQuAD [arXiv:1606.05250], the no-archive variant performed comparably to the weaker "Implications" baseline, while the archive variant outperformed "Implications" and approached the strongest human-designed "Rewrite" baseline without surpassing it. Further analysis of collapse in the model's exploration revealed that a naive archive can confer some short-term robustness but can also accelerate homogenization, suggesting that explicit novelty pressure may be required to consistently advance beyond carefully optimized human strategies. Our code is available at https://github.com/cheongalc/search-self-edit-strategies .
Abstract（参考訳）: 多くのLLMベースのオープンエンド検索システムは、長期の進歩を妨げかねない既存のソリューションの改善を提案する基盤モデルを凍結する。最近の研究は、テスト時に提案モデルを更新することを検討している(arXiv:2511.23473]が、更新戦略は通常手作業で決められている。そこで本研究では,LLMがタスクフィードバックを用いて重みを更新する方法について検討した。トラクタビリティに関しては,自己改善のラウンドが1ラウンドしかないという単純なケースに注目し,更新演算子を自己教師型次のトークン予測(NTP)に制限した上で,トレーニングデータとキーNTPハイパーパラメータを選択する際のモデル自由を残した。テストベッドとしてSelf-Adapting Language Models (SEAL) [arXiv:2506.10943]フレームワークを使用して、固定された人間のテンプレート制約を緩和し、モデルが独自のセルフ編集テンプレートを生成することを可能にし、トレーニングデータとハイパーパラメータのコントロールをより強化した。テンプレート生成が過去のテンプレートの軽量アーカイブ上で条件付けられたかどうかが異なる2つのバリエーションが研究された。 SEAL's Single-Passage Knowledge Incorporation set with Qwen3-8B on SQuAD [arXiv:1606.05250] において、非構造的変種はより弱い「Implications」ベースラインと互換性を持って実行され、アーカイブ的変種は「Implications」を上回り、それを超えることなく最強の人間設計の「Rewrite」ベースラインに接近した。モデルの探索における崩壊のさらなる分析により、単純アーカイブは短期的な堅牢性を引き出すことができるが、均質化を加速する可能性も示され、明確な新規性圧力は、慎重に最適化された人間の戦略を超えて一貫して前進するために必要である可能性が示唆された。私たちのコードはhttps://github.com/cheongalc/search-self-edit-strategiesで利用可能です。

論文の概要: Search over Self-Edit Strategies for LLM Adaptation

関連論文リスト