Fugu-MT 論文翻訳(概要): Streaming Model Cascades for Semantic SQL

論文の概要: Streaming Model Cascades for Semantic SQL

arxiv url: http://arxiv.org/abs/2604.00660v1
Date: Wed, 01 Apr 2026 09:07:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-02 16:44:31.916286
Title: Streaming Model Cascades for Semantic SQL
Title（参考訳）: セマンティックSQLのためのモデルカスケードのストリーミング
Authors: Paweł Liskowski, Kyle Schmaus,
Abstract要約: 本稿では,各ワーカが個別にパーティションを処理できる2つの適応カスケードアルゴリズムを提案する。プロダクションセマンティックSQLエンジンにおける6つのデータセットの実験は、両方のアルゴリズムがデータセット毎にF1 > 0.95を達成することを示している。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern data warehouses extend SQL with semantic operators that invoke large language models on each qualifying row, but the per-row inference cost is prohibitive at scale. Model cascades reduce this cost by routing most rows through a fast proxy model and delegating uncertain cases to an expensive oracle. Existing frameworks, however, require global dataset access and optimize a single quality metric, limiting their applicability in distributed systems where data is partitioned across independent workers. We present two adaptive cascade algorithms designed for streaming, per-partition execution in which each worker processes its partition independently without inter-worker communication. SUPG-IT extends the SUPG statistical framework to streaming execution with iterative threshold refinement and joint precision-recall guarantees. GAMCAL replaces user-specified quality targets with a learned calibration model: a Generalized Additive Model maps proxy scores to calibrated probabilities with uncertainty quantification, enabling direct optimization of a cost-quality tradeoff through a single parameter. Experiments on six datasets in a production semantic SQL engine show that both algorithms achieve F1 > 0.95 on every dataset. GAMCAL achieves higher F1 per oracle call at cost-sensitive operating points, while SUPG-IT reaches a higher quality ceiling with formal guarantees on precision and recall.
Abstract（参考訳）: 現代のデータウェアハウスは、各予選行で大きな言語モデルを呼び出すセマンティック演算子でSQLを拡張している。モデルカスケードは、高速なプロキシモデルを通じてほとんどの行をルーティングし、不確実なケースを高価なオラクルに委譲することで、このコストを削減する。しかし、既存のフレームワークはグローバルなデータセットアクセスを必要とし、単一の品質メトリクスを最適化する。本稿では,作業者間通信を使わずに,各作業者が独立して分割処理を行う,ストリーミング・パーパーティション実行用に設計された2つの適応カスケードアルゴリズムを提案する。 SUPG-ITは、SUPG統計フレームワークを拡張して、繰り返ししきい値の精細化とジョイント精度のリコール保証によるストリーミング実行を実現している。 GAMCALは、ユーザの指定した品質目標を、学習されたキャリブレーションモデルに置き換える: 一般化付加モデル(Generalized Additive Model)は、プロキシスコアを不確実な定量化を伴うキャリブレーションされた確率にマッピングし、単一のパラメータによるコスト品質トレードオフの直接最適化を可能にする。プロダクションセマンティックSQLエンジンにおける6つのデータセットの実験は、両方のアルゴリズムがデータセット毎にF1 > 0.95を達成することを示している。 GAMCALは、コストに敏感な操作ポイントにおいて、より高いF1コールを達成する一方、SUPG-ITは、精度とリコールに関する正式な保証とともに、高品質な天井に達する。

論文の概要: Streaming Model Cascades for Semantic SQL

関連論文リスト