Fugu-MT 論文翻訳(概要): Fine-tuning with RAG for Improving LLM Learning of New Skills

論文の概要: Fine-tuning with RAG for Improving LLM Learning of New Skills

arxiv url: http://arxiv.org/abs/2510.01375v1
Date: Wed, 01 Oct 2025 19:03:48 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-03 16:59:20.831815
Title: Fine-tuning with RAG for Improving LLM Learning of New Skills
Title（参考訳）: 新しいスキルのLLM学習改善のためのRAGによる微調整
Authors: Humaid Ibrahim, Nikolai Rozanov, Marek Rei,
Abstract要約: 大規模言語モデル(LLM)エージェントは予測可能な方法で頻繁に失敗する。本稿では,推論時間検索を蒸留による学習能力に変換する単純なパイプラインを提案する。
参考スコア（独自算出の注目度）: 8.825427873545063
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language model (LLM) agents deployed for multi-step tasks frequently fail in predictable ways: attempting actions with unmet preconditions, issuing redundant commands, or mishandling environment constraints. While retrieval-augmented generation (RAG) can improve performance by providing runtime guidance, it requires maintaining external knowledge databases and adds computational overhead at every deployment. We propose a simple pipeline that converts inference-time retrieval into learned competence through distillation. Our approach: (1) extracts compact, reusable hints from agent failures, (2) uses these hints to generate improved teacher trajectories via one-shot retrieval at episode start, and (3) trains student models on these trajectories with hint strings removed, forcing internalization rather than memorization. Across two interactive benchmarks, ALFWorld (household tasks) and WebShop (online shopping), distilled students consistently outperform baseline agents, achieving up to 91% success on ALFWorld (vs. 79% for baselines) and improving WebShop scores to 72 (vs. 61 for baselines), while using 10-60% fewer tokens than retrieval-augmented teachers depending on the environment. The approach generalizes across model scales (7B/14B parameters) and agent architectures (ReAct/StateAct), demonstrating that retrieval benefits can be effectively internalized through targeted fine-tuning without permanent runtime dependencies.
Abstract（参考訳）: マルチステップタスクにデプロイされる大規模言語モデル(LLM)エージェントは、未完成のプリコンディションを使ったアクションの試行、冗長なコマンドの発行、環境制約の誤った処理など、予測可能な方法で頻繁に失敗する。検索強化生成(RAG)は実行時ガイダンスを提供することでパフォーマンスを向上させることができるが、外部の知識データベースを維持し、デプロイ毎に計算オーバーヘッドを追加する必要がある。本稿では,推論時間検索を蒸留による学習能力に変換する単純なパイプラインを提案する。提案手法は,(1)エージェント障害からコンパクトで再利用可能なヒントを抽出し,(2)エピソード開始時のワンショット検索により教師の軌跡を改良し,(3)ヒント文字列を除去して学生モデルを訓練し,記憶よりも内部化を強制する。 ALFWorld(ハウスホールド・タスク)とWebShop(オンライン・ショッピング)の2つのインタラクティブなベンチマークで、学生の蒸留はベースライン・エージェントを一貫して上回り、ALFWorld(ベースラインは79%)で91%の成功を達成し、WebShopのスコアを72(ベースラインは61)に改善した。このアプローチはモデルスケール (7B/14Bパラメータ) とエージェントアーキテクチャ (ReAct/StateAct) にまたがって一般化されており、永続的なランタイム依存なしに、ターゲットの微調整によって、検索のメリットを効果的に内部化できることを実証している。

論文の概要: Fine-tuning with RAG for Improving LLM Learning of New Skills

関連論文リスト