Fugu-MT 論文翻訳(概要): Catch Your Breath: Adaptive Computation for Self-Paced Sequence Production

論文の概要: Catch Your Breath: Adaptive Computation for Self-Paced Sequence Production

arxiv url: http://arxiv.org/abs/2510.13879v1
Date: Mon, 13 Oct 2025 21:07:05 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-17 21:15:14.506618
Title: Catch Your Breath: Adaptive Computation for Self-Paced Sequence Production
Title（参考訳）: Catch Your Breath: 自己更新シーケンス生成のための適応的な計算
Authors: Alexandre Galashov, Matt Jones, Rosemary Ke, Yuan Cao, Vaishnavh Nagarajan, Michael C. Mozer,
Abstract要約: 我々は,言語モデルが入力トークン毎に使用する計算ステップの数を動的かつ自律的に拡張できるような,教師付きトレーニング目標のクラスを探索する。任意のトークンに対して、モデルは don't know> 出力を出力することで、追加の計算ステップを要求できる。 CYBモデルでは精度が向上し,トークンレベルの複雑性とコンテキストに処理時間を適用することができる。
参考スコア（独自算出の注目度）: 55.76222360698305
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We explore a class of supervised training objectives that allow a language model to dynamically and autonomously scale the number of compute steps used for each input token. For any token, the model can request additional compute steps by emitting a <don't know> output. If the model is granted a delay, a specialized <pause> token is inserted at the next input step, providing the model with additional compute resources to generate an output. The model can request multiple pauses. To train the model to use <don't know> outputs judiciously and to calibrate its uncertainty, we frame the selection of each output token as a sequential-decision problem with a time cost. We refer to the class of methods as $\textit{Catch Your Breath}$ losses and we study three methods in this class: CYB-AP frames the model's task as anytime prediction, where an output may be required at any step and accuracy is discounted over time; CYB-VA is a variational approach that aims to maximize prediction accuracy subject to a specified distribution over stopping times; and CYB-DP imposes a penalty based on a computational budget. Through fine-tuning experiments, we identify the best performing loss variant. The CYB model needs only one third as much training data as the baseline (no pause) model needs to achieve the same performance, and half as much data as a model with pauses and a cross-entropy loss. We find that the CYB model requests additional steps when doing so improves accuracy, and the model adapts its processing time to token-level complexity and context. For example, it often pauses after plural nouns like $\textit{patients}$ and $\textit{challenges}$ but never pauses after the first token of contracted words like $\textit{wasn}$ and $\textit{didn}$, and it shows high variability for ambiguous tokens like $\textit{won}$, which could function as either a verb or part of a contraction.
Abstract（参考訳）: 我々は,言語モデルが入力トークン毎に使用する計算ステップの数を動的かつ自律的に拡張できるような,教師付きトレーニング目標のクラスを探索する。任意のトークンに対して、モデルは<Don't know>出力を出力することで、追加の計算ステップを要求できる。モデルに遅延が与えられると、次の入力ステップで特別な<pause>トークンが挿入され、出力を生成するための追加の計算リソースが提供される。モデルは複数の一時停止を要求できる。モデルに<Don't know>出力を訓練し、その不確実性を校正するために、各出力トークンの選択を時間的コストでシーケンシャルな決定問題として設定する。 CYB-APは任意のステップでアウトプットを要求され、精度は時間の経過とともに割引される場合があり、CYB-VAは特定の分布の予測精度を停止時間を超えて最大化することを目的とした変分アプローチであり、CYB-DPは計算予算に基づいてペナルティを課す。微調整実験により、最も優れた損失変種を特定する。 CYBモデルは、ベースライン(一時停止なし)モデルと同じパフォーマンスを達成するために必要なトレーニングデータの3分の1しか必要としない。 CYBモデルでは精度が向上し,トークンレベルの複雑性とコンテキストに処理時間を適用することができる。例えば、$\textit{ patients}$や$\textit{challenges}$のような複数の名詞の後に停止するが、$\textit{wasn}$や$\textit{didn}$のような契約された単語の最初のトークンの後に停止することはない。

論文の概要: Catch Your Breath: Adaptive Computation for Self-Paced Sequence Production

関連論文リスト