Fugu-MT 論文翻訳(概要): When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling

論文の概要: When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling

arxiv url: http://arxiv.org/abs/2510.15346v1
Date: Fri, 17 Oct 2025 06:18:29 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-20 20:17:34.492635
Title: When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling
Title（参考訳）: 組立のタイミング:安定かつ高速なLCM組立のためのToken-Level Pointsの同定
Authors: Heecheol Yun, Kwangmin Ki, Junghyun Lee, Eunho Yang,
Abstract要約: 本研究では,従来のアンサンブル法を長文生成に用いた場合,アンサンブル位置を慎重に選択する必要があることを示す。我々は,これらの要因を共同で検討し,選択的にアンサンブルするフレームワークSAFE(Stable and Fast LLM Ensembling)を提案する。 MATH500 や BBH などの多種多様なベンチマーク実験により,SAFE は既存の手法よりも精度と効率が優れていることを示した。
参考スコア（独自算出の注目度）: 41.54273937469359
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Ensembling Large Language Models (LLMs) has gained attention as a promising approach to surpass the performance of individual models by leveraging their complementary strengths. In particular, aggregating models' next-token probability distributions to select the next token has been shown to be effective in various tasks. However, while successful for short-form answers, its application to long-form generation remains underexplored. In this paper, we show that using existing ensemble methods in long-form generation requires a careful choice of ensembling positions, since the standard practice of ensembling at every token often degrades performance. We identify two key factors for determining these positions: tokenization mismatch across models and consensus in their next-token probability distributions. Based on this, we propose SAFE, (Stable And Fast LLM Ensembling), a framework that selectively ensembles by jointly considering these factors. To further improve stability, we introduce a probability sharpening strategy that consolidates probabilities spread across multiple sub-word tokens representing the same word into a single representative token. Our experiments on diverse benchmarks, including MATH500 and BBH, demonstrate that SAFE outperforms existing methods in both accuracy and efficiency, with gains achieved even when ensembling fewer than 1% of tokens.
Abstract（参考訳）: LLM(Ensembling Large Language Models)は,個々のモデルの性能を相補的な強みを利用して超越する,有望なアプローチとして注目されている。特に、次のトークンを選択するためにモデルの次の確率分布を集約することは、様々なタスクで有効であることが示されている。しかし、短文の解答に成功しながらも、長文生成への応用は未定のままである。本稿では,従来のアンサンブル法を長文生成に使用する場合,各トークンにおけるアンサンブルの標準的な手法が性能を劣化させるため,適切なアンサンブル位置を選択する必要があることを示す。モデル間でのトークン化ミスマッチと、次の確率分布におけるコンセンサスである。そこで我々は,これらの要因を共同で検討し,選択的にアンサンブルするフレームワークSAFE(Stable and Fast LLM Ensembling)を提案する。安定性をさらに向上するために,同じ単語を表す複数のサブワードトークンにまたがる確率を単一の代表トークンに集約する確率短縮戦略を導入する。 MATH500 や BBH などの多種多様なベンチマーク実験により,SAFE は既存の手法よりも精度と効率が優れており,トークンの1% 未満の場合にも利得が得られることを示した。

論文の概要: When to Ensemble: Identifying Token-Level Points for Stable and Fast LLM Ensembling

関連論文リスト