Fugu-MT 論文翻訳(概要): A1: Asynchronous Test-Time Scaling via Conformal Prediction

論文の概要: A1: Asynchronous Test-Time Scaling via Conformal Prediction

arxiv url: http://arxiv.org/abs/2509.15148v1
Date: Thu, 18 Sep 2025 16:55:09 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-19 17:26:53.347625
Title: A1: Asynchronous Test-Time Scaling via Conformal Prediction
Title（参考訳）: A1: コンフォーマル予測による非同期テスト時間スケーリング
Authors: Jing Xiong, Qiujiang Chen, Fanghua Ye, Zhongwei Wan, Chuanyang Zheng, Chenyang Zhao, Hui Shen, Alexander Hanbo Li, Chaofan Tao, Haochen Tan, Haoli Bai, Lifeng Shang, Lingpeng Kong, Ngai Wong,
Abstract要約: 大規模な言語モデル(LLM)は、テスト時のスケーリングの恩恵を受けるが、既存のメソッドは重大な課題に直面している。 A1(非同期テスト時間スケーリング)は統計的に保証された適応推論フレームワークで、これらの課題に対処します。 A1は56.7倍のスピードアップと4.14倍のスループット向上を実現している。
参考スコア（独自算出の注目度）: 112.54016379556073
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language models (LLMs) benefit from test-time scaling, but existing methods face significant challenges, including severe synchronization overhead, memory bottlenecks, and latency, especially during speculative decoding with long reasoning chains. We introduce A1 (Asynchronous Test-Time Scaling), a statistically guaranteed adaptive inference framework that addresses these challenges. A1 refines arithmetic intensity to identify synchronization as the dominant bottleneck, proposes an online calibration strategy to enable asynchronous inference, and designs a three-stage rejection sampling pipeline that supports both sequential and parallel scaling. Through experiments on the MATH, AMC23, AIME24, and AIME25 datasets, across various draft-target model families, we demonstrate that A1 achieves a remarkable 56.7x speedup in test-time scaling and a 4.14x improvement in throughput, all while maintaining accurate rejection-rate control, reducing latency and memory overhead, and no accuracy loss compared to using target model scaling alone. These results position A1 as an efficient and principled solution for scalable LLM inference. We have released the code at https://github.com/menik1126/asynchronous-test-time-scaling.
Abstract（参考訳）: 大規模な言語モデル(LLM)は、テスト時のスケーリングの恩恵を受けるが、既存のメソッドは、特に長い推論チェーンを持つ投機的デコーディングにおいて、厳しい同期オーバーヘッド、メモリボトルネック、レイテンシといった重大な課題に直面している。 A1(非同期テスト時間スケーリング)は統計的に保証された適応推論フレームワークで、これらの課題に対処します。 A1は算術強度を改良し、同期を主要なボトルネックとして識別し、非同期推論を可能にするオンラインキャリブレーション戦略を提案し、逐次スケーリングと並列スケーリングの両方をサポートする3段階のリジェクションサンプリングパイプラインを設計する。 MATH, AMC23, AIME24, AIME25データセットに対する実験により, A1はテスト時間スケーリングにおいて56.7倍の高速化を実現し,スループットは4.14倍向上した。これらの結果は、A1 をスケーラブル LLM 推論のための効率的で原則化された解として位置づけている。コードをhttps://github.com/menik1126/asynchronous-test-time-scalingでリリースしました。

論文の概要: A1: Asynchronous Test-Time Scaling via Conformal Prediction

関連論文リスト