Fugu-MT 論文翻訳(概要): The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More

論文の概要: The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More

arxiv url: http://arxiv.org/abs/2603.23971v1
Date: Wed, 25 Mar 2026 06:07:39 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-26 21:06:11.154182
Title: The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More
Title（参考訳）: 価格逆転現象:チーパー推論モデルが終わればコストが上がる
Authors: Lingjiao Chen, Chi Zhang, Yeye He, Ion Stoica, Matei Zaharia, James Zou,
Abstract要約: リストAPIの価格設定は、実際のコストに対する信頼性の低いプロキシである。思考トークンのコストの削減は、ランキングの反転を70%削減します。この結果から,コスト意識モデル選択と透過的な要求毎のコスト監視の必要性が示唆された。
参考スコア（独自算出の注目度）: 76.93600828673503
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Developers and consumers increasingly choose reasoning language models (RLMs) based on their listed API prices. However, how accurately do these prices reflect actual inference costs? We conduct the first systematic study of this question, evaluating 8 frontier RLMs across 9 diverse tasks covering competition math, science QA, code generation, and multi-domain reasoning. We uncover the pricing reversal phenomenon: in 21.8% of model-pair comparisons, the model with a lower listed price actually incurs a higher total cost, with reversal magnitude reaching up to 28x. For example, Gemini 3 Flash's listed price is 78% cheaper than GPT-5.2's, yet its actual cost across all tasks is 22% higher. We trace the root cause to vast heterogeneity in thinking token consumption: on the same query, one model may use 900% more thinking tokens than another. In fact, removing thinking token costs reduces ranking reversals by 70% and raises the rank correlation (Kendall's $τ$ ) between price and cost rankings from 0.563 to 0.873. We further show that per-query cost prediction is fundamentally difficult: repeated runs of the same query yield thinking token variation up to 9.7x, establishing an irreducible noise floor for any predictor. Our findings demonstrate that listed API pricing is an unreliable proxy for actual cost, calling for cost-aware model selection and transparent per-request cost monitoring.
Abstract（参考訳）: 開発者やコンシューマは、列挙されたAPI価格に基づいて、推論言語モデル(RLM)を選択するようになっている。しかし、これらの価格が実際の推論コストをどの程度正確に反映しているか? 本研究は,競争数学,科学QA,コード生成,マルチドメイン推論を含む9つのタスクを対象とした8つのフロンティア RLM の評価を行った。モデルペア比較の21.8%では、列挙された価格の低いモデルの方が、実際には総コストが高く、リバーサルサイズは最大28倍に達する。例えば、Gemini 3 Flash は GPT-5.2 よりも 78% 安いが、実際の作業にかかるコストは 22% 高い。同じクエリでは、あるモデルでは、他のモデルよりも900%多くの思考トークンを使用することができます。実際、思考トークンコストの除去は、ランクの反転を70%削減し、価格とコストのランクの相関(ケンドールのτ$)を0.563から0.873に引き上げる。さらに、クエリごとのコスト予測が基本的に困難であることを示し、同じクエリ収率のトークンの変動を最大9.7倍に繰り返し実行し、任意の予測器に対する既約ノイズフロアを確立する。以上の結果から,列挙されたAPI価格が実際のコストに対する信頼性の低いプロキシであり,コスト意識のモデル選択と要求毎のコスト監視が求められていることがわかった。

論文の概要: The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More

関連論文リスト