Fugu-MT 論文翻訳(概要): PTPP-Aware Adaptation Scaling Laws: Predicting Domain-Adaptation Performance at Unseen Pre-Training Budgets

論文の概要: PTPP-Aware Adaptation Scaling Laws: Predicting Domain-Adaptation Performance at Unseen Pre-Training Budgets

arxiv url: http://arxiv.org/abs/2510.23198v1
Date: Mon, 27 Oct 2025 10:36:15 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-28 15:28:15.523655
Title: PTPP-Aware Adaptation Scaling Laws: Predicting Domain-Adaptation Performance at Unseen Pre-Training Budgets
Title（参考訳）: PTPP-Aware Adaptation Scaling Laws:Repredicting Domain-Adaptation Performance at Unseen Pre-Training Budgets
Authors: Etienne Goffinet, Shane Bergsma, Avraham Sheinin, Natalia Vassilieva, Shaheer Muhammad, Preslav Nakov, Gurpreet Gosal,
Abstract要約: 既存の事前訓練法は、適応結果を予測する能力を制限する固定された事前訓練予算を前提としている。本稿では、事前学習予算を明示的な変数とし、未確認のptppでの適応損失の正確な予測を可能にするemphPTPP対応適応スケーリング法を提案する。
参考スコア（独自算出の注目度）: 39.874108063927
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Continual pre-training (CPT) for domain adaptation must balance target-domain gains with stability on the base domain. Existing CPT scaling laws typically assume a fixed pre-training budget, which limits their ability to forecast adaptation outcomes for models trained at different tokens-per-parameter (PTPP). We present \emph{PTPP-aware} adaptation scaling laws that make the pre-training budget an explicit variable, enabling accurate \emph{prediction} of adaptation loss at unseen \ptpp. On a multilingual setup (English/Arabic $\rightarrow$ French), PTPP-aware formulations trained on early stages (\ptpp{}=\{15,31\}) predict target loss at \ptpp{}=279 and outperform a PTPP-agnostic \dcpt{} transfer baseline on metrics (Huber-on-log, MAE$_\mathrm{rel}$, calibration slope); full diagnostics (RMSE, MAPE) are in the appendix. Beyond forecasting, we show a practical use case: planning replay ratios and adaptation token budgets that satisfy target and forgetting constraints under compute limits.
Abstract（参考訳）: ドメイン適応のための連続事前訓練(CPT)は、ベースドメインの安定性と目標ドメインゲインのバランスをとる必要がある。既存のCPTスケーリング法は、通常、固定された事前訓練予算を仮定し、異なるトークン毎パラメータ(PTPP)でトレーニングされたモデルの適応結果を予測する能力を制限する。本稿では、事前学習予算を明示変数とし、未確認のptppにおける適応損失の正確なemph{prediction}を可能にする適応スケーリング法を提案する。多言語設定(英語/アラビア語$\rightarrow$ French)では、PTPP対応の定式化が早期(\ptpp{}=\{15,31\})でトレーニングされたターゲット損失を予測し、メトリクス(Huber-on-log, MAE$_\mathrm{rel}$, calibration slope; 完全な診断(RMSE, MAPE)が付加される。予測以外にも、目標を満たすリプレイ率と適応トークン予算の計画、計算限界下での制約の無視といった実践的なユースケースを示す。

論文の概要: PTPP-Aware Adaptation Scaling Laws: Predicting Domain-Adaptation Performance at Unseen Pre-Training Budgets

関連論文リスト