Fugu-MT 論文翻訳(概要): LARFT: Closing the Cognition-Action Gap for Length Instruction Following in Large Language Models

論文の概要: LARFT: Closing the Cognition-Action Gap for Length Instruction Following in Large Language Models

arxiv url: http://arxiv.org/abs/2603.19255v1
Date: Wed, 25 Feb 2026 15:34:21 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-06 02:36:12.786265
Title: LARFT: Closing the Cognition-Action Gap for Length Instruction Following in Large Language Models
Title（参考訳）: LARFT:大規模言語モデルにおける長さ指示に対する認知反応ギャップの閉鎖
Authors: Wei Zhang, Lintong Du, Yuanhe Zhang, Zhenhong Zhou, Kun Wang, Li Sun, Sen Su,
Abstract要約: LARFT(Length-Aware Reinforcement Fine-Tuning)を提案する。 LARFTは、長さ指向強化学習と後眼長認識を統合している。実験により、LARFTは既存のベースラインより優れており、ベンチマーク後の3つの長さの命令に対して+20.92ポイントの平均的な改善が達成されている。
参考スコア（独自算出の注目度）: 13.817055649196107
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite the strong performance of Large Language Models (LLMs) on complex instruction-following tasks, precise control of output length remains a persistent challenge. Existing methods primarily attempt to enforce length constraints by externally imposing length signals or optimization objectives, while largely overlooking the underlying limitation: the model's intrinsic deficit in length cognition. To address this, we propose LARFT (Length-Aware Reinforcement Fine-Tuning), a training framework that aligns the model's length cognition with its action. Specifically, LARFT integrates length-oriented reinforcement learning with a hindsight length awareness. By transforming on-policy data into hindsight self-awareness tasks where the model learns to identify the actual length of its own generation, LARFT jointly optimizes the model's internal representation of length information and refines its policy to satisfy length constraints, thereby achieving precise and reliable length instruction following. Extensive experiments across four base models demonstrate that LARFT outperforms existing baselines, achieving an average improvement of +20.92 points across three length instruction following benchmarks with only a marginal decline of -1.45 points on four general capability benchmarks.
Abstract（参考訳）: 複雑な命令追従タスクにおけるLarge Language Models (LLMs) の強い性能にもかかわらず、出力長の正確な制御は永続的な課題である。既存の手法は、主に長さの信号や最適化の目的を外部に課すことによって長さの制約を強制しようとするが、モデル固有の長の認識能力の欠如を主に見落としている。そこで本研究では,LARFT(Length-Aware Reinforcement Fine-Tuning)を提案する。具体的には、LARFTは長さ指向強化学習と後視長認識を統合している。 LARFTは、オンラインデータを自己認識タスクに変換することで、モデルが自己生成の実際の長さを特定することを学習し、モデルの内部的な長さ情報の表現を最適化し、そのポリシーを洗練して長さ制約を満たすことにより、正確で信頼性の高い長さ指示を実現する。 4つのベースモデルにわたる大規模な実験により、LARFTは既存のベースラインより優れており、4つの一般能力ベンチマークで-1.45ポイントの差しか無く、3つのトレーニングの後に平均で+20.92ポイントを達成している。

論文の概要: LARFT: Closing the Cognition-Action Gap for Length Instruction Following in Large Language Models

関連論文リスト