Fugu-MT 論文翻訳(概要): ATTS: Asynchronous Test-Time Scaling via Conformal Prediction

論文の概要: ATTS: Asynchronous Test-Time Scaling via Conformal Prediction

arxiv url: http://arxiv.org/abs/2509.15148v2
Date: Sun, 28 Sep 2025 15:40:34 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:18.783376
Title: ATTS: Asynchronous Test-Time Scaling via Conformal Prediction
Title（参考訳）: ATTS: コンフォーマル予測による非同期テスト時間スケーリング
Authors: Jing Xiong, Qiujiang Chen, Fanghua Ye, Zhongwei Wan, Chuanyang Zheng, Chenyang Zhao, Hui Shen, Alexander Hanbo Li, Chaofan Tao, Haochen Tan, Haoli Bai, Lifeng Shang, Lingpeng Kong, Ngai Wong,
Abstract要約: 大規模な言語モデル(LLM)は、テスト時のスケーリングの恩恵を受けるが、しばしば高い推論遅延によって妨げられる。統計的に保証された適応スケーリングフレームワークであるATTS(Asynchronous Test-Time Scaling)を紹介する。 ATTSは、テストタイムのスケーリングにおいて最大56.7倍のスピードアップと4.14倍のスループット向上を実現している。
参考スコア（独自算出の注目度）: 112.54016379556073
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language models (LLMs) benefit from test-time scaling but are often hampered by high inference latency. Speculative decoding is a natural way to accelerate the scaling process; however, scaling along both the parallel and sequential dimensions poses significant challenges, including substantial memory-bound execution and synchronization overhead. We introduce ATTS (Asynchronous Test-Time Scaling), a statistically guaranteed adaptive scaling framework that follows the hypothesis testing process to address these challenges. By revisiting arithmetic intensity, ATTS identifies synchronization as the primary bottleneck. It enables asynchronous inference through online calibration and proposes an ordinal classification algorithm that supports a three-stage rejection sampling pipeline, scaling along both the sequential and parallel axes. Across experiments on the MATH, AMC23, AIME24, and AIME25 datasets and across multiple draft-target model families, we show that ATTS delivers up to 56.7x speedup in test-time scaling and a 4.14x throughput improvement, while maintaining accurate control of the rejection rate, reducing latency and memory overhead, and incurring no accuracy loss. By scaling both in parallel and sequential dimensions, we enable the 1.5B/70B draft/target model combination to achieve the performance of the state-of-the-art reasoning model o3-mini (high) on the AIME dataset. We have released the code at https://github.com/menik1126/asynchronous-test-time-scaling.
Abstract（参考訳）: 大規模な言語モデル(LLM)は、テスト時のスケーリングの恩恵を受けるが、しばしば高い推論遅延によって妨げられる。投機的復号化は、スケーリングプロセスを加速する自然な方法であるが、並列次元とシーケンシャル次元の両方のスケーリングは、メモリバウンド実行や同期オーバーヘッドなど、大きな課題を引き起こす。 ATTS(Asynchronous Test-Time Scaling)は,これらの課題に対処するための仮説テストプロセスに従う,統計的に保証された適応スケーリングフレームワークである。算術強度を再考することにより、ATTSは同期を主要なボトルネックと認識する。オンラインキャリブレーションによる非同期推論を可能にし,逐次軸と並列軸の両方に沿ってスケールする3段階の拒絶サンプリングパイプラインをサポートする順序分類アルゴリズムを提案する。我々は、MATH、AMC23、AIME24、AIME25データセット、および複数のドラフトターゲットモデルファミリに関する実験を通して、ATTSが最大56.7倍の高速化と4.14倍のスループット向上を実現し、拒絶率の正確な制御を維持し、レイテンシとメモリオーバーヘッドを低減し、精度の低下を生じさせないことを示した。並列次元とシーケンシャル次元の両方をスケールすることにより、1.5B/70Bのドラフト/ターゲットモデルの組み合わせで、AIMEデータセット上の最先端の推論モデルo3-mini(high)の性能を実現できる。コードをhttps://github.com/menik1126/asynchronous-test-time-scalingでリリースしました。

論文の概要: ATTS: Asynchronous Test-Time Scaling via Conformal Prediction

関連論文リスト