Fugu-MT 論文翻訳(概要): 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

論文の概要: 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

arxiv url: http://arxiv.org/abs/2603.15970v1
Date: Mon, 16 Mar 2026 22:42:45 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.021639
Title: 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models
Title（参考訳）: 100倍コストとレイテンシ低減:軽量プロキシモデルを用いたAIクエリ近似の性能解析
Authors: Yeounoh Chung, Rushabh Desai, Jian He, Yu Xiao, Thibaud Hottelier, Yves-Laurent Kom Samo, Pushkar Kadilkar, Xianshun Chen, Sam Idicula, Fatma Özcan, Alon Halevy, Yannis Papakonstantinou,
Abstract要約: 我々は、低コストな分析とデータベースアプリケーションがAIクエリの恩恵を受けることができるAIクエリ近似手法を提案する。このアプローチは、セマンティックフィルタ(AI.IF)演算子に対して、100倍のコストとレイテンシ低減と、セマンティックランキング(AI.RANK)の重要なゲインを提供する。レイテンシとコストの大幅な増加にもかかわらず、これらのプロキシモデルは精度を保ち、さまざまなベンチマークデータセットの精度を時折向上させる。
参考スコア（独自算出の注目度）: 6.985494432089493
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Several data warehouse and database providers have recently introduced extensions to SQL called AI Queries, enabling users to specify functions and conditions in SQL that are evaluated by LLMs, thereby broadening significantly the kinds of queries one can express over the combination of structured and unstructured data. LLMs offer remarkable semantic reasoning capabilities, making them an essential tool for complex and nuanced queries that blend structured and unstructured data. While extremely powerful, these AI queries can become prohibitively costly when invoked thousands of times. This paper provides an extensive evaluation of a recent AI query approximation approach that enables low cost analytics and database applications to benefit from AI queries. The approach delivers >100x cost and latency reduction for the semantic filter (AI.IF) operator and also important gains for semantic ranking (AI.RANK). The cost and performance gains come from utilizing cheap and accurate proxy models over embedding vectors. We show that despite the massive gains in latency and cost, these proxy models preserve accuracy and occasionally improve accuracy across various benchmark datasets, including the extended Amazon reviews benchmark that has 10M rows. We present an OLAP-friendly architecture within Google \textit{BigQuery} for this approach for purely online (ad hoc) queries, and a low-latency HTAP database-friendly architecture in \textit{AlloyDB} that could further improve the latency by moving the proxy model training offline. We present techniques that accelerate the proxy model training.
Abstract（参考訳）: 最近、いくつかのデータウェアハウスとデータベースプロバイダが、AI Queriesと呼ばれるSQLの拡張を導入し、LLMによって評価されるSQLの関数と条件を指定できるようにし、構造化データと非構造化データの組み合わせによって表現できるクエリの種類を大幅に拡大した。 LLMは驚くべきセマンティック推論機能を提供しており、構造化データと非構造化データを混在させる複雑でニュアンスなクエリに欠かせないツールである。非常に強力だが、何千回も呼び出されると、これらのAIクエリは違法にコストがかかる可能性がある。本稿では、低コストな分析とデータベースアプリケーションがAIクエリの恩恵を受けられるように、最近のAIクエリ近似手法を広範囲に評価する。このアプローチは、セマンティックフィルタ(AI.IF)演算子に対する100倍のコストとレイテンシの低減と、セマンティックランキング(AI.RANK)の重要な利益を提供する。コストとパフォーマンスの向上は、埋め込みベクタよりも安価で正確なプロキシモデルを利用することによって実現される。レイテンシとコストの大幅な増加にもかかわらず、これらのプロキシモデルは精度を保ち、時には1000万行のAmazonレビューベンチマークを含む、さまざまなベンチマークデータセットの精度を向上します。我々は、Google \textit{BigQuery}内のOLAPフレンドリなアーキテクチャを、純粋にオンライン(アドホック)クエリのためのこのアプローチに適用し、低レイテンシのHTAPデータベースフレンドリなアーキテクチャを \textit{AlloyDB}で提示し、プロキシモデルのトレーニングをオフラインにすることで、レイテンシをさらに改善する。プロキシモデルのトレーニングを加速するテクニックを提案する。

論文の概要: 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

関連論文リスト