Fugu-MT 論文翻訳(概要): Federated Language Models Under Bandwidth Budgets: Distillation Rates and Conformal Coverage

論文の概要: Federated Language Models Under Bandwidth Budgets: Distillation Rates and Conformal Coverage

arxiv url: http://arxiv.org/abs/2605.09986v1
Date: Mon, 11 May 2026 05:01:43 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.53161
Title: Federated Language Models Under Bandwidth Budgets: Distillation Rates and Conformal Coverage
Title（参考訳）: 帯域予算下におけるフェデレーション言語モデル:蒸留率と等角被覆
Authors: Prasanjit Dubey, Xiaoming Huo,
Abstract要約: 集中できない帯域制限ノードに散在するデータに基づいて言語モデルを訓練することは、臨床ネットワーク、企業知識基盤、科学コンソーシアムで発生する設定である。ノード間でデータを分散し続けなければならない状況について検討し、明示的な帯域幅予算の下では、何の統計的保証が得られるのかを問う。
参考スコア（独自算出の注目度）: 12.805268849262243
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training a language model on data scattered across bandwidth-limited nodes that cannot be centralized is a setting that arises in clinical networks, enterprise knowledge bases, and scientific consortia. We study the regime in which data must remain distributed across nodes, and ask what statistical guarantees are in principle achievable under explicit bandwidth budgets; we aim to characterize what is provably possible, not to demonstrate a deployment-ready system. Existing theory treats either training-time consistency or inference-time calibration in isolation, and none makes bandwidth a first-class statistical parameter. We analyze two protocols, Federated Probe-Logit Distillation (FPLD) for training and Federated Conformal RAG (FC-RAG) for inference, as the analytical vehicles for our results. Our first main result is an explicit high-probability KL-consistency rate for FPLD with simultaneous dependence on node count $K$, per-node sample size $n$, quantization budget $B$, probe-set size $m$, and vocabulary size $V$; bandwidth enters only through an exponentially vanishing quantization term. Our second main result is a distribution-free marginal-coverage bound for FC-RAG, whose novel retrieval-bandwidth slack $Δ_{\mathrm{RAG}} = f_{\max}\sqrt{K^{-2}\sum_i v(B_i)}$ makes per-node retrieval bandwidth a first-class statistical parameter, with arithmetic aggregation across $K$ nodes shrinking the slack as $K^{-1/2}$ in the per-node-uniform regime. A Pinsker-type corollary composes the two bounds into an end-to-end coverage guarantee. Synthetic experiments verify the predicted scaling along the bounds' parameters; small-scale experiments on a GPT-2 testbed illustrate that the qualitative bandwidth-accuracy tradeoff survives on a real language model. A deployment-scale empirical evaluation is out of scope.
Abstract（参考訳）: 集中できない帯域制限ノードに散在するデータに基づいて言語モデルを訓練することは、臨床ネットワーク、企業知識基盤、科学コンソーシアムで発生する設定である。ノード間でデータを分散し続けなければならない状況について検討し、明示的な帯域幅予算の下では、何の統計的保証が得られるのかを原則として問う。既存の理論では、トレーニング時の一貫性や推論時のキャリブレーションを個別に扱い、帯域幅を第一級の統計パラメータにするものは存在しない。我々は,FPLD (Federated Probe-Logit Distillation) とFC-RAG (Federated Conformal RAG) の2つのプロトコルを解析対象として分析した。最初の結果は、ノード数$K$、ノード当たりのサンプルサイズ$n$、量子化予算$B$、プローブセットサイズ$m$、ボキャブラリサイズ$V$、帯域幅は指数関数的に消滅する量子化項によってのみ入力される、FPLDの高確率KL一貫性率である。 2つ目の結果は、FC-RAGの分布自由な辺縁被覆であり、その新しい検索帯域幅スラック$Δ_{\mathrm{RAG}} = f_{\max}\sqrt{K^{-2}\sum_i v(B_i)}$はノードごとの検索帯域幅を1級統計パラメータとし、演算集約はノードごとのユニフォーム方式で$K^{-1/2}$としてスラックを縮小する。ピンスカー型コーナリーは2つの境界をエンドツーエンドのカバレッジ保証に構成する。 GPT-2テストベッドでの小規模実験は、定性的帯域幅精度のトレードオフが実際の言語モデルで残っていることを示している。デプロイメント規模の経験的評価はスコープ外です。

論文の概要: Federated Language Models Under Bandwidth Budgets: Distillation Rates and Conformal Coverage

関連論文リスト